DFKI-LT - An Empirical Comparison of Unknown Word Prediction Methods

Kostadin Cholakov, Gertjan van Noord, Valia Kordoni, Yi Zhang
An Empirical Comparison of Unknown Word Prediction Methods
3 Proceedings of the 5th International Joint Conference on Natural Language Processing, Chiang Mai, Thailand, Springer, 2011
We compare two types of methods which deal with unknown words in the context of computational grammars. Methods of the first type are based on the idea of supertagging and use a tagger to predict lex- ical descriptions for unknown tokens in a given input. The second type of methods perform lexical acquisition (LA) which, in the context of this paper, refers to the automatic acquisition of new lexical entries for the lexicon of a given grammar. The methods are compared based on the effect their application has on the parsing coverage and accuracy of the GG grammar of German (Crysmann, 2003). In particular, we adapt the LA method of Cholakov and van Noord (2010) which was originally developed for the Dutch Alpino sys- tem to be used with the GG. Its impact on coverage and accuracy on a test corpus of German newspaper texts is compared to the results reported previously on the same corpus for methods which employed a tag- ger. Furthermore, in a smaller experiment, we show that the linguistic knowledge this LA method provides can also be used for sentence realisation.
Files: BibTeX