Acquiring a Poor Man's Inflectional Lexicon for German

Peter Adolphs

In: European Language Resources Association (ELRA) (Hrsg.). Proceedings of the Sixth International Language Resources and Evaluation (LREC'08). International Conference on Language Resources and Evaluation (LREC) 6th Marrakech Morocco ELRA 2008.


Many NLP modules and applications require the availability of a module for wide-coverage inflectional analysis. One way to obtain such analyses is to use a morphological analyser in combination with an inflectional lexicon. Since large text corpora nowadays are easily available and inflectional systems are in general well understood, it seems feasible to acquire lexical data from raw texts, guided by our knowledge of inflection. I present an acquisition method along these lines for German. The general idea can be roughly summarised as follows: first, generate a set of lexical entry hypotheses for each word-form in the corpus; then, select hypotheses that explain the word-forms found in the corpus "best". To this end, I have turned an existing morphological grammar, cast in finite-state technology (Schmid et al. 2004), into a hypothesiser for lexical entries. Irregular forms are simply listed so that they do not interfere with the regular rules used in the hypothesiser. Running the hypothesiser on a text corpus yields a large number of lexical entry hypotheses. These are then ranked according to their validity with the help of a statistical model that is based on the number of attested and predicted word forms for each hypothesis.


Weitere Links

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence