DFKI-LT - Inducing a Computational Lexicon from a Corpus with Syntactic and Semantic Information
Inducing a Computational Lexicon from a Corpus with Syntactic and Semantic Information
2 Proceedings of the 7th International Workshop on Computational Semantics, IWCS-07 Proceedings of the 7th IWCS 2007 volume 1/2007, 2007
To date, linguistically annotated corpora are mainly exploited for feature-based training of automatic labelling systems. In this paper, we present a general approach for the Description Logics-based modelling of multi-layered annotated corpora that offers (i) flexible and enhanced querying functionality that goes beyond current XML-based query languages, (ii) a basis for consistency checking, and (iii) a general method for defining abstractions over corpus annotations. We apply this method to the syntactically and semantically annotated SALSA/TIGER corpus. By defining abstractions over the corpus data, we generalise from a large set of individual corpus annotations to a corresponding lexicon model. We discuss issues arising from modelling multi-layered corpus annotations in Description Logics and formalisation of multi-layered corpus annotations illustrate the benefits of our approach at concrete examples.