Skip to main content Skip to main navigation


Clustering and Classification of Document Structure - A Machine Learning Approach

Andreas Dengel; Frank Dubiel
In: Proceedings of the Third International Conference on Document Analysis and Recognition. International Conference on Document Analysis and Recognition (ICDAR-95), August 14-15, Montreal, QC, Canada, Pages 587-591, Vol. 2, ISBN 0-8186-7128-9, IEEE Computer Society, Washington, DC, USA, 1995.


We describe a system which is capable of learning the presentation of document logical structures, exemplarily shown for business letters. Presenting a set of instances to the system, it clusters them into structural concepts and induces a concept hierarchy. This concept hierarchy is taken as a source for classifying future input. The paper introduces the different learning steps, describes how the resulting concept hierarchy is applied for logical labeling and reports on the results.