A Shallow Text Processing Core Engine

Günter Neumann, Jakub Piskorski

In: Journal of Computational Intelligence Journal of Computational Intelligence. 18 3 2002.


In this paper we present SPPC, a high-performance system for intelligent extraction of structured data from free text documents. SPPC consists of a set of domain-adaptive shallow core components that are realized by means of cascaded weighted finite state machines and generic dynamic tries. The system has been fully implemented for German; it includes morphological and on-line compound analysis, effcient POS-filtering, high performance named entity recognition and chunk parsing based on a novel divideand-conquer strategy. The whole approach proved to be very useful for processing free word order languages like German. SPPC has a good performance (more than 6000 words per second on standard PC environments) and achieves high linguistic coverage, especially for the divide-and-conquer parsing strategy, where we obtained an f-measure of 87.14% on unseen data.

compintell.pdf (pdf, 485 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence