Skip to main content Skip to main navigation

Publikation

A Shallow Text Processing Core Engine

Günter Neumann; Jakub Piskorski
In: Journal of Computational Intelligence, Vol. 18, No. 3, 2002.

Zusammenfassung

In this paper we present SPPC, a high-performance system for intelligent extraction of structured data from free text documents. SPPC consists of a set of domain-adaptive shallow core components that are realized by means of cascaded weighted finite state machines and generic dynamic tries. The system has been fully implemented for German; it includes morphological and on-line compound analysis, effcient POS-filtering, high performance named entity recognition and chunk parsing based on a novel divideand-conquer strategy. The whole approach proved to be very useful for processing free word order languages like German. SPPC has a good performance (more than 6000 words per second on standard PC environments) and achieves high linguistic coverage, especially for the divide-and-conquer parsing strategy, where we obtained an f-measure of 87.14% on unseen data.