Background Variability Modeling for Statistical Layout Analysis

Faisal Shafait, Joost van Beusekom, Daniel Keysers, Thomas Breuel

In: Proceedings of the 19th International Conference on Pattern Recognition. International Conference on Pattern Recognition (ICPR-2008) December 8-11 Tampa Florida United States IEEE 2008.


Geometric layout analysis plays an important role in document image understanding. Many algorithms known in literature work well on standard document images, achiev- ing high text line segmentation accuracy on the UW-III dataset. These algorithms rely on certain assumptions about document layouts, and fail when their underlying as- sumptions are not met. Also, they do not provide confidence scores for their output. These two problems limit the use- fulness of general purpose layout analysis methods in large scale applications. In this contribution, we propose a sta- tistically motivated model-based trainable layout analysis system that allows assumption-free adaptation to different layout types and produces likelihood estimates of the cor- rectness of the computed page segmentation. The perfor- mance of our approach is tested on a subset of the Google 1000 books dataset where it achieved a text line segmen- tation accuracy of 98.4% on layouts where other general- purpose algorithms failed to do a correct segmentation.


Weitere Links

2008-IUPR-07Aug_0818.pdf (pdf, 356 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence