Skip to main content Skip to main navigation

Publication

Spiralling towards perfection: an incremental approach for mutual lexicon-tagger improvement

Karlheinz Moerth; Stephan Procházka; Omar Siam; Thierry Declerck
In: Jelena Kallas; Iztok Kosem (Hrsg.). Proceedings of eLex 2013. Biennial Conference on Electronic Lexicography (eLex-13), located at Electronic Lexicography in the 21st Century: Thinking outside the Paper, October 17-19, Tallinn, Estonia, trojina, Ljubljana, 10/2013.

Abstract

Our paper describes an experiment in which four different digital language resources are used to incrementally create added value in one another. The resources are a digital dictionary, a morphological analyser, a tagger and a digital corpus. We will show how the dictionary is used to improve the tagger, how the tagger is used to annotate a collaboratively produced digital text collection, i.e. the Egyptian language Wikipedia, thus improving easily available open data and lastly how the results of the annotation process are – in turn – utilised to enhance and improve the dictionary. The paper touches on several issues related to the particular tasks involved in the process: we discuss problems of dealing with data retrieved from the internet, we give details on the lemmatisation, the creation of word-class information and the generation of frequency data from the corpus and we touch on issues of dictionary creation and aspects of the dictionary-corpus-interface. A final topic are standards for the representation of the statistical information in the digital dictionary.

Projekte