Document Image Dewarping using Robust Estimation of Curled Text Lines

Adrian Ulges, Christoph Lampert, Thomas Breuel

In: International Conference on Document Analysis and Recognition (ICDAR). International Conference on Document Analysis and Recognition (ICDAR) Seoul Pages 1001-1005 IEEE 8/2005.


Digital cameras have become almost ubiquitous, and their use for fast and casual capturing of natural images is unchallenged. For making images of documents, however, they have not caught up to flatbed scanners yet, mainly because camera images tend to suffer from distortion due to the perspective and are therefore limited in their further use for archival or OCR. For images of non-planar paper surfaces like books, page curl causes additional distortion, which poses an even greater problem due to its nonlinearity. This paper presents a new algorithm for removing both perspective and page curl distortion. It requires only a single camera image as input and relies on a priori layout information instead of additional hardware. Therefore, it is much more user friendly than most previous approaches, and allows for flexible ad hoc document capture. Results are presented showing that the algorithm produces visually pleasing output and increases OCR accuracy, thus having the potential to become a general purpose preprocessing tool for camera based document capture.

UlgesCHLTMBDocImageDewarping.pdf (pdf, 537 KB )

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz