Publikation
Ulrich Schäfer
Saarbrücken Dissertations in Computational Linguistics and Language Te, Vol. 22, ISBN 978-3-933218-21-6, DFKI GmbH and Computational Linguistics Department, Saarland University, Saarbrücken, Germany, 6/2007.
In the main part of the thesis, we describe our contributions to three hybrid architecture frameworks that make use of these fundamentals. SProUT is a shallow system that uses elements of deep constraint-based processing, namely type hierarchy and typed feature structures. Whiteboard is the first hybrid architecture to integrate not only part-of-speech tagging, but also named entity recognition and topological parsing, with deep parsing. Finally, we present Heart of Gold, a middleware architecture that generalizes Whiteboard into various dimensions such as configurability, multilinguality and flexible processing strategies.
We describe various applications that have been implemented using the hybrid frameworks such as structured named entity recognition, information extraction, creative document authoring support, deep question analysis, as well as evaluations. In Whiteboard, e.g., it could be shown that shallow pre-processing increases both coverage and efficiency of deep parsing by a factor of more than two. Heart of Gold not only forms the basis for applications that utilize semantics-oriented natural language analysis, but also constitutes a complex research instrument for experimenting with novel processing strategies combining deep and shallow methods, and eases replication and comparability of results.