Skip to main content Skip to main navigation


Document Structure Analysis Based on Layout and Textual Features

Stefan Klink; Andreas Dengel; Thomas Kieninger
In: Int. Workshop on Document Analysis Systems. IAPR International Workshop on Document Analysis Systems (DAS), 12/2000.


Document image processing is a crucial process in the office automation and begins from the 'OCR' phase with difficulty of the document 'analysis' and 'understanding'. This paper presents a hybrid and comprehensive approach to document structure analysis. Hybrid in the sense, that it makes use of layout (geometrical) as well as textual features of a given document. These features are the base for potential conditions which in turn are used to express fuzzy matched rules of an underlying rule base. Rules can be formulated based on features which might be observed within one specific layout object. But furthermore, rules can also express dependencies between different layout objects. In addition to its rule driven analysis, which allows an easy adaptation to specific domains with their specific logical objects, the system contains domain-independent markup algorithms for common objects (e.g. lists).