Text versus non-Text Distinction in Online Handwritten Documents

Emanuel Indermühle; Horst Bunke; Faisal Shafait; Thomas Breuel
In: 25th ACM Symposium On Applied Computing, Document Engineering Track. ACM Symposium On Applied Computing (SAC-2010), March 22-26, Sierre, Switzerland, ACM, 3/2010.


The aim of this paper is to explore how well the task of text vs. non-text distinction can be solved in online handwritten documents using only offline information. Two systems are introduced. The first system generates a document segmentation first. For this purpose, four methods originally developed for machine printed documents are compared: x-y cut, morphological closing, Voronoi segmentation, and whitespace analysis. A state-of-the art classifier then distinguishes between text and non-text zones. The second system follows a bottom-up approach that classifies connected components. Experiments are performed on a new dataset of online handwritten documents containing different content types in arbitrary arrangements. The best system assigns 94.3% of the pixels to the correct class.



Weitere Links