Skip to main content Skip to main navigation

Publikation

Bootstrapped Extraction of Index Terms from Normalized User-Generated Content

Piroska Lendvai; Thierry Declerck
In: Michael Beißwenger; Torsten Zesch (Hrsg.). Proceedings of the 2nd Workshop on Natural Language Processing for Computer-mediated Communication and Social Media. Workshop on Natural Language Processing for Computer-mediated Communication and Social Media (NLP4CMC-15), located at International Conference of the German Society for Computational Linguistics and Language - GSCL 2015, September 29, Essen - Duisburg, Germany, Pages 44-48, GSCL, 2015.

Zusammenfassung

We report on the extraction of key phrases for news events, based on string alignment between social media posts and user-linked web documents. Hashtag normalization is tested for enhancing string similarity, while both token-based tweet similarity and manual event annotations are tested for transferring web links to posts that do not refer to external documents. We are able to identify more terms via web link transfer compared to no link transfer, and obtain syntactically and semantically more complex terms compared to general document-based term extraction.

Projekte