DFKI-LT - Spiralling towards perfection: an incremental approach for mutual lexicon-tagger improvement

Karlheinz Moerth, Stephan Procházka, Omar Siam, Thierry Declerck
Spiralling towards perfection: an incremental approach for mutual lexicon-tagger improvement
in: Jelena Kallas, Iztok Kosem (eds.):
1 Proceedings of eLex 2013, Tallinn, Estonia, trojina, Ljubljana, 10/2013
 
Our paper describes an experiment in which four different digital language resources are used to incrementally create added value in one another. The resources are a digital dictionary, a morphological analyser, a tagger and a digital corpus. We will show how the dictionary is used to improve the tagger, how the tagger is used to annotate a collaboratively produced digital text collection, i.e. the Egyptian language Wikipedia, thus improving easily available open data and lastly how the results of the annotation process are – in turn – utilised to enhance and improve the dictionary. The paper touches on several issues related to the particular tasks involved in the process: we discuss problems of dealing with data retrieved from the internet, we give details on the lemmatisation, the creation of word-class information and the generation of frequency data from the corpus and we touch on issues of dictionary creation and aspects of the dictionary-corpus-interface. A final topic are standards for the representation of the statistical information in the digital dictionary.
 
Files: BibTeX, eLex2013_moerth_prochazka_siam_declerck.pdf