DFKI-LT - SentiMerge: Combining Sentiment Lexicons in a Bayesian Framework

Guy Emerson, Thierry Declerck
SentiMerge: Combining Sentiment Lexicons in a Bayesian Framework
2 Proceedings of the Workshop on Lexical and Grammatical Resources for Language Processing, Dublin, Ireland, The COLING 2014 Organizing Committe, Dublin, 8/2014
Many approaches to sentiment analysis rely on a lexicon that labels words with a prior polarity. This is particularly true for languages other than English, where labelled training data is not easily available. Existing efforts to produce such lexicons exist, and to avoid duplicated effort, a principled way to combine multiple resources is required. In this paper, we introduce a Bayesian probabilistic model, which can simultaneously combine polarity scores from several data sources and estimate the quality of each source. We apply this algorithm to a set of four German sentiment lexicons, to produce the SentiMerge lexicon, which we make publically available. In a simple classification task, we show that this lexicon outperforms each of the underlying resources, as well as a majority vote model.
Files: BibTeX, Lg-Lp-TrendMiner-2014-1.pdf