DFKI-LT - Bootstrapped Extraction of Index Terms from Normalized User-Generated Content

Piroska Lendvai, Thierry Declerck
Bootstrapped Extraction of Index Terms from Normalized User-Generated Content
in: Michael Bei▀wenger, Torsten Zesch (eds.):
1 Proceedings of the 2nd Workshop on Natural Language Processing for Computer-mediated Communication and Social Media, Pages 44-48, Essen - Duisburg, Germany, GSCL, GSCL, 2015
 
We report on the extraction of key phrases for news events, based on string alignment between social media posts and user-linked web documents. Hashtag normalization is tested for enhancing string similarity, while both token-based tweet similarity and manual event annotations are tested for transferring web links to posts that do not refer to external documents. We are able to identify more terms via web link transfer compared to no link transfer, and obtain syntactically and semantically more complex terms compared to general document-based term extraction.
 
Files: BibTeX, cmc@gscl_paper.pdf