Text versus non-Text Distinction in Online Handwritten Documents

Emanuel Indermühle, Horst Bunke, Faisal Shafait, Thomas Breuel

In: 25th ACM Symposium On Applied Computing, Document Engineering Track. ACM Symposium On Applied Computing (SAC-2010) March 22-26 Sierre Switzerland ACM 3/2010.


The aim of this paper is to explore how well the task of text vs. non-text distinction can be solved in online handwritten documents using only offline information. Two systems are introduced. The first system generates a document segmentation first. For this purpose, four methods originally developed for machine printed documents are compared: x-y cut, morphological closing, Voronoi segmentation, and whitespace analysis. A state-of-the art classifier then distinguishes between text and non-text zones. The second system follows a bottom-up approach that classifies connected components. Experiments are performed on a new dataset of online handwritten documents containing different content types in arbitrary arrangements. The best system assigns 94.3% of the pixels to the correct class.


Indermuehle-Online-Handwritten-Document-Segmentation-SAC10.pdf (pdf, 316 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence