Inducing a Computational Lexicon from a Corpus with Syntactic and Semantic Information

Dennis Spohr, Aljoscha Burchardt, Sebastian Pado, Anette Frank, Ulrich Heid

In: Proceedings of the 7th International Workshop on Computational Semantics, IWCS-07 Proceedings of the 7th IWCS 2007. International Conference on Computational Semantics (IWCS) 1/2007 2007.


To date, linguistically annotated corpora are mainly exploited for feature-based training of automatic labelling systems. In this paper, we present a general approach for the Description Logics-based modelling of multi-layered annotated corpora that offers (i) flexible and enhanced querying functionality that goes beyond current XML-based query languages, (ii) a basis for consistency checking, and (iii) a general method for defining abstractions over corpus annotations. We apply this method to the syntactically and semantically annotated SALSA/TIGER corpus. By defining abstractions over the corpus data, we generalise from a large set of individual corpus annotations to a corresponding lexicon model. We discuss issues arising from modelling multi-layered corpus annotations in Description Logics and formalisation of multi-layered corpus annotations illustrate the benefits of our approach at concrete examples.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence