DFKI-LT - Bootstrapped Extraction of Index Terms from Normalized User-Generated Content
Bootstrapped Extraction of Index Terms from Normalized User-Generated Content
1 Proceedings of the 2nd Workshop on Natural Language Processing for Computer-mediated Communication and Social Media,
We report on the extraction of key phrases for news events, based on string alignment between social media posts and user-linked web documents. Hashtag normalization is tested for enhancing string similarity, while both token-based tweet similarity and manual event annotations are tested for transferring web links to posts that do not refer to external documents. We are able to identify more terms via web link transfer compared to no link transfer, and obtain syntactically and semantically more complex terms compared to general document-based term extraction.
Files: BibTeX, cmc@gscl_paper.pdf