Performance Comparison of Six Algorithms for Page Segmentation

Faisal Shafait, Daniel Keysers, Thomas Breuel

In: Horst Bunke , A. Lawrence Spitz (Hrsg.). 7th IAPR Workshop on Document Analysis Systems (DAS). IAPR International Workshop on Document Analysis Systems (DAS) Nelson Seiten 368-379 LNCS 3872 Springer 2/2006.


This paper presents a quantitative comparison of six algorithms for page segmentation: X-Y cut, smearing, whitespace analysis, constrained text-line finding, Docstrum, and Voronoi-diagram-based. The evaluation is performed using a subset of the UW-III collection commonly used for evaluation, with a separate training set for parameter optimization. We compare the results using both default parameters and optimized parameters. In the course of the evaluation, the strengths and weaknesses of each algorithm are analyzed, and it is shown that no single algorithm outperforms all other algorithms. However, we observe that the three best-performing algorithms are those based on constrained text-line finding, Docstrum, and the Voronoi-diagram.

FsDkTmbPerfComp6AlgDAS2006.pdf (pdf, 217 KB )

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence