[Schaefer 2007] - The Book
Ulrich Schaefer: Integrating Deep and Shallow Natural
Language Processing Components - Representations and
Hybrid Architectures. Dissertation defended on
June 29, 2007, in the Faculty of Mathematics and
Computer Science, Saarland University. Download PDF;
324 pages A4; 3,3 MBytes.
Published as Volume 22, Saarbrücken Dissertations in
Computational Linguistics and Language Technology.
German Research Center for Artificial Intelligence
and Saarland University, Saarbrücken, Germany.
356 pages A5. ISBN 978-3-933218-21-6.
Order hardcopy here. bibtex.
We describe basic concepts and software architectures for the
integration of shallow and deep (linguistics-based, semantics-oriented)
natural language processing (NLP) components. The main goal of this
novel, hybrid integration paradigm is improving robustness of deep
processing. After an introduction to constraint-based natural language
parsing, we give an overview of typical shallow processing tasks.
We introduce XML standoff markup as an additional abstraction layer
that eases integration of NLP components, and propose the use of
XSLT as a standardized and efficient transformation language for
online NLP integration.
In the main part of the thesis, we describe our contributions to
three hybrid architecture frameworks that make use of these
fundamentals. SProUT is a shallow system that uses elements of deep
constraint-based processing, namely type hierarchy and typed feature
structures. Whiteboard is the first hybrid architecture to
integrate not only part-of-speech tagging, but also named entity
recognition and topological parsing, with deep parsing. Finally, we
present Heart of Gold, a middleware architecture that generalizes
Whiteboard into various dimensions such as configurability,
multilinguality and flexible processing strategies.
We describe various applications that have been implemented using
the hybrid frameworks such as structured named entity recognition,
information extraction, creative document authoring support, deep
question analysis, as well as evaluations. In Whiteboard, e.g., it
could be shown that shallow pre-processing increases both coverage
and efficiency of deep parsing by a factor of more than two.
Heart of Gold not only forms the basis for applications that utilize
semantics-oriented natural language analysis, but also constitutes a
complex research instrument for experimenting with novel processing
strategies combining deep and shallow methods, and eases replication
and comparability of results.
[My other publications at DFKI] [Back to my DFKI homepage]