DFKI-LT - An Intelligent Text Extraction and Navigation System

Jakub Piskorski, GŁnter Neumann
An Intelligent Text Extraction and Navigation System
2 Proceedings of the 6th International Conference on Computer-Assisted Information Retrieval (RIAO'00), o.A., Paris, France, 2000
 
We present SPPC, a high-performance system for intelligent text extraction and navigation from German free text documents. SPPC consists of a set of domain-independent shallow core components which are realized by means of cascaded weighted finite state machines and generic dynamic tries. All extracted information is represented uniformly in one data structure (called the text chart) in a highly compact and linked form in order to support indexing and navigation through the set of solutions. German text processing includes (among others) compound processing, high performance named entity recognition and chunk parsing based on a divide-and-conquer strategy. SPPC has a good performance (4380 words per second on standard PC environments) and high linguistic coverage.
 
Files: BibTeX, Piskorski:2000:ITE.pdf