Skip to main content Skip to main navigation


An Intelligent Text Extraction and Navigation System

Jakub Piskorski; Günter Neumann
In: Proceedings of the 6th International Conference on Computer-Assisted Information Retrieval (RIAO'00). International Conference on Computer-Assisted Information Retrieval (RIAO), 2000.


We present SPPC, a high-performance system for intelligent text extraction and navigation from German free text documents. SPPC consists of a set of domain-independent shallow core components which are realized by means of cascaded weighted finite state machines and generic dynamic tries. All extracted information is represented uniformly in one data structure (called the text chart) in a highly compact and linked form in order to support indexing and navigation through the set of solutions. German text processing includes (among others) compound processing, high performance named entity recognition and chunk parsing based on a divide-and-conquer strategy. SPPC has a good performance (4380 words per second on standard PC environments) and high linguistic coverage.

Weitere Links