Multilevel Annotation for Dynamic Free Text Processing


The project aims at designing, implementing, investigating and evaluating a new system architecture that facilitates the combination of different language technologies for a range of practical applications. Language technologies offer numerous means for a partial analysis of texts that can be employed for information retrieval, information extraction, language checking, and many other applications. Processing methods and tools differ along several dimensions, e.g., wrt. levels of linguistic description, depth of analysis, or the way knowledge of language is derived (linguistically or statistically). Methods often overlap in their functionality but differ in their strengths and weaknesses. Finding optimal combinations of heterogeneous techniques and processing components is one of the most difficult tasks in language processing - the challenge of the WHITEBOARD project. The novel architecture to be developed and explored in WHITEBOARD is based on the concept of an annotated text. The different LT components enrich an XML-encoded text with layers of new meta-information that are also represented in XML. Each component can exploit or disregard previously assigned annotations. The WHITEBOARD architecture has a single shared data structure, which at the same time is the input, throughput, and output of the system. The envisaged architecture permits the pragmatic combination of different processing approaches, most notably novel ways of the combination of shallow and deep methods.

Funded by:Federal Ministry of Education and Research
Project Manager:Hans Uszkoreit (Hans.Uszkoreit@dfki.de)
Contact:GŁnter Neumann (Guenter.Neumann@dfki.de)
Duration: 01.01.2000 - 31.12.2002