Using a Parameterizable and Domain-Adaptive Information Extraction System for Annotating Large-Scale Corpora?

Günter Neumann; Thierry Declerck
In: Proceedings of the Pre-Conference Workshop "Information Extraction meets Corpus Linguistics", May 30. International Conference on Language Resources and Evaluation (LREC), 2000.


In this paper we describe a parameterizable and domain-adaptive Information Extraction (IE) system (for German texts) and present some ideas on how this kind of system could effectively support Corpus Linguistics (CL) tasks. We also tentatively address the complementary question and look in which sense corpus linguistics can be beneficial to IE, specially in the case of automatic learning of templates of interest for IE tasks, a topic which is crucial for the further development of highly flexible IE systems. We describe briefly some steps done for the adaptation of the IE system to a new domain in order to illustrate the points where in our opinion IE and CL should go for a closer cooperation.