Publication

Bootstrapped Extraction of Index Terms from Normalized User-Generated Content

Piroska Lendvai, Thierry Declerck

In: Michael Beißwenger, Torsten Zesch (editor). Proceedings of the 2nd Workshop on Natural Language Processing for Computer-mediated Communication and Social Media. Workshop on Natural Language Processing for Computer-mediated Communication and Social Media (NLP4CMC-15) located at International Conference of the German Society for Computational Linguistics and Language - GSCL 2015 September 29 Essen - Duisburg Germany Pages 44-48 GSCL 2015.

Abstract

We report on the extraction of key phrases for news events, based on string alignment between social media posts and user-linked web documents. Hashtag normalization is tested for enhancing string similarity, while both token-based tweet similarity and manual event annotations are tested for transferring web links to posts that do not refer to external documents. We are able to identify more terms via web link transfer compared to no link transfer, and obtain syntactically and semantically more complex terms compared to general document-based term extraction.

Projekte

cmc@gscl_paper.pdf (pdf, 145 KB)

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz