Skip to main content Skip to main navigation

Publication

SentiMerge: Combining Sentiment Lexicons in a Bayesian Framework

Guy Emerson; Thierry Declerck
In: Proceedings of the Workshop on Lexical and Grammatical Resources for Language Processing. Workshop on Lexical and Grammatical Resources for Language Processing (LG-LP-14), located at 25th International Conference on Computational Linguistics, August 24, Dublin, Ireland, The COLING 2014 Organizing Committe, Dublin, 8/2014.

Abstract

Many approaches to sentiment analysis rely on a lexicon that labels words with a prior polarity. This is particularly true for languages other than English, where labelled training data is not easily available. Existing efforts to produce such lexicons exist, and to avoid duplicated effort, a principled way to combine multiple resources is required. In this paper, we introduce a Bayesian probabilistic model, which can simultaneously combine polarity scores from several data sources and estimate the quality of each source. We apply this algorithm to a set of four German sentiment lexicons, to produce the SentiMerge lexicon, which we make publically available. In a simple classification task, we show that this lexicon outperforms each of the underlying resources, as well as a majority vote model.

Projekte