Publikation

Domain Adaptive Information Extraction

Günter Neumann; Thierry Declerck

In: Proceedings of the International Workshop on Innovative Language Technology and Chinese Information Processing (ILT & CIP '01), April. International Workshop on Innovative Language Technology and Chinese Information Processing (ILT&CIP), 2001.

Zusammenfassung

We present in this paper the methodology developed within the PARADIME (Parameterizable Domain-Adaptive Information and Message Extraction) project for designing an Information Extraction (IE) system easily adaptable to new domains of application. For this we went for a strict separation of the (shallow) linguistic processing modules on the one hand and the domain-modeling modules on the other hand, thus looking for the maximal degree of reusability of common linguistic resources shared by all domains of application. The tools used for the domain-modeling allow a declarative description of the domain under consideration and a simple (abstract) mapping to the output of the Natural Language (NL) analysis, thus requiring only few and very general linguistic knowledge for the adaptation of the IE-system to new applications. We describe a real scale experiment on a fast adaptation cycle of the system to a new domain - the soccer domain - and present the first results obtained.

Neumann_2001_DAI.pdf (pdf, 95 KB )