Project | WHITEBOARD

Duration: 01/01/2000 - 12/31/2002

Multilevel Annotation for Dynamic Free Text Processing

The project aims at designing, implementing, investigating and evaluating a new system architecture that facilitates the combination of different language technologies for a range of practical applications. Language technologies offer numerous means for a partial analysis of texts that can be employed for information retrieval, information extraction, language checking, and many other applications. Processing methods and tools differ along several dimensions, e.g., wrt. levels of linguistic description, depth of analysis, or the way knowledge of language is derived (linguistically or statistically). Methods often overlap in their functionality but differ in their strengths and weaknesses. Finding optimal combinations of heterogeneous techniques and processing components is one of the most difficult tasks in language processing - the challenge of the WHITEBOARD project. The novel architecture to be developed and explored in WHITEBOARD is based on the concept of an annotated text. The different LT components enrich an XML-encoded text with layers of new meta-information that are also represented in XML. Each component can exploit or disregard previously assigned annotations. The WHITEBOARD architecture has a single shared data structure, which at the same time is the input, throughput, and output of the system. The envisaged architecture permits the pragmatic combination of different processing approaches, most notably novel ways of the combination of shallow and deep methods.

WHITEBOARD will be built on top of existing DFKI LT components: the morphological processing system MORPHIX, tagger and phrase parsers TnT and Chunkie, the information extraction system SMES, the efficient HPSG parsing system PET, HPSG Grammars for German, English (Stanfords Lingo Grammar) and Japanese, the controlled language checking system FLAG.
Two applications are realized for the purpose of evaluating and demonstrating the results. One application is information extraction. As the automatic understanding of entire texts will remain outside of reach for quite some time, the strategy to approach this goal is the gradual improvement of our IE technology.
The second application is controlled language checking. Here again, we cannot expect from today's technology a comprehensive and correct analysis of an entire text. We might be able, however, to specialize our deep analysis in such a way that it can apply a deep analysis with sufficient precision in certain environments that are critical for the correct diagnosis and correction of errors.

Contact Person

Prof. Dr. Günter Neumann

Guenter.Neumann@dfki.de

Keyfacts

Publications

All publications

Integrating Natural Language Processing Components with XML and XSLT
Ulrich Schäfer
ISBN 9783836490276, VDM Verlag Dr. Müller, Saarbrücken, 4/2008.
Bootstrapping Relation Extraction from Semantic Seeds
Feiyu Xu
PhD-Thesis, Saarland University, 2007.
A Bag of Useful Techniques for Unification-Based Finite-State Transducers
Hans-Ulrich Krieger; Witold Drozdzynski; Jakub Piskorski; Ulrich Schäfer; Feiyu Xu
In: Proceedings of 7th KONVENS. Konferenz zur Verarbeitung natürlicher Sprache (KONVENS), Vienna, Pages 105-112, 9/2004.

Project | WHITEBOARD

Multilevel Annotation for Dynamic Free Text Processing

Contact Person

Keyfacts

Involved research areas

Head

Website

Publications

Integrating Natural Language Processing Components with XML and XSLT

Bootstrapping Relation Extraction from Semantic Seeds

A Bag of Useful Techniques for Unification-Based Finite-State Transducers

Funding Authorities

BMBF - Federal Ministry of Education and Research

Share project:

Contact Person

Keyfacts

Involved research areas

Head

Website

Integrating Natural Language Processing Components with XML and XSLT

Bootstrapping Relation Extraction from Semantic Seeds

A Bag of Useful Techniques for Unification-Based Finite-State Transducers

Funding Authorities

BMBF - Federal Ministry of Education and Research