Spiralling towards perfection: an incremental approach for mutual lexicon-tagger improvement

Karlheinz Moerth, Stephan Procházka, Omar Siam, Thierry Declerck

In: Jelena Kallas , Iztok Kosem (Hrsg.). Proceedings of eLex 2013. Biennial Conference on Electronic Lexicography (eLex-13) befindet sich Electronic Lexicography in the 21st Century: Thinking outside the Paper October 17-19 Tallinn Estonia trojina Ljubljana 10/2013.


Our paper describes an experiment in which four different digital language resources are used to incrementally create added value in one another. The resources are a digital dictionary, a morphological analyser, a tagger and a digital corpus. We will show how the dictionary is used to improve the tagger, how the tagger is used to annotate a collaboratively produced digital text collection, i.e. the Egyptian language Wikipedia, thus improving easily available open data and lastly how the results of the annotation process are – in turn – utilised to enhance and improve the dictionary. The paper touches on several issues related to the particular tasks involved in the process: we discuss problems of dealing with data retrieved from the internet, we give details on the lemmatisation, the creation of word-class information and the generation of frequency data from the corpus and we touch on issues of dictionary creation and aspects of the dictionary-corpus-interface. A final topic are standards for the representation of the statistical information in the digital dictionary.


eLex2013_moerth_prochazka_siam_declerck.pdf (pdf, 483 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence