Processing and Normalizing Hashtags

Thierry Declerck; Piroska Lendvai

In: Galia Angelova; Kalina Bontcheva; Ruslan Mitko (Hrsg.). Proceedings of RANLP 2015. International Conference on Recent Advances in Natural Language Processing (RANLP-15), September 7-9, Hissar, Bulgaria, Pages 104-110, ISBN ISSN 1313-8502, INCOMA Ltd, Shoumen, BULGARIA, 9/2015.


We present ongoing work in linguistic pro-cessing of hashtags in Twitter text, with the goal of supplying normalized hashtag content to be used in more complex natural language processing (NLP) tasks. Hashtags represent collectively shared topic designators with considerable surface variation that can hamper semantic interpretation. Our normalization scripts allow for the lexical consolidation and segmentation of hashtags, potentially leading to improved semantic classification.


ranlp2015_hashtag_final.pdf (pdf, 272 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence