Automatized Merging of Italian Lexical Resources

Thierry Declerck, Stefania Racioppa, Karlheinz Mörth

In: Núria Bel , Maria Gavrilidou , Monica Monachini , Valeria Quochi , Laura Rimell (Hrsg.). Proceeding of the LREC 2012 Workshop on Language Resource Merging. International Conference on Language Resources and Evaluation (LREC-12) 8th befindet sich LREC May 22 Istanbul Turkey ELRA Paris 5/2012.


In the context of a recently started European project, TrendMiner, there is a need for a large lexical coverage of various languages, among those the Italian language. The lexicon should include morphological, syntactic and semantic information, but also features for representing the level of opinion or sentiment that can be expressed by the lexical entries. Since there is no yet ready to use such lexicon, we investigated the possibility to access and merge various Italian lexical resources. A departure point was the freely available Morph-it! lexicon, which is containing inflected forms with their lemma and morphological features. We transformed the textual format of Morph-it! onto a database schema, in order to support integration process with other resources. We then considered Italian lexicon entries available in various versions of Wiktionary for adding further information, like origin, uses and senses of the entries. We explore the need to have a standardized representation of lexical resources in order to better integrate the various lexical information from the distinct sources, and we also describe a first conversion of the lexical information onto a computational lexicon.


LREC-Merge-Res-IT-TD.pdf (pdf, 39 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence