Publication

Dewarping of Camera-Captured Document Images

Syed Saqib Bukhari, Faisal Shafait, Thomas Breuel

In: CBDAR 2009 Online Proceedings. International Workshop on Camera-Based Document Analysis and Recognition (CBDAR-2009) July 25 Barcelona Spain Pages 34-41 Online 2009.

Abstract

Traditional OCR systems are designed for planar (de-warped) images and the accuracy is reduced when applied on warped images. Therefore, developing new OCR techniques for warped images or developing dewarping techniques are the possible solutions for improving OCR accuracy camera-captured documents. Among different types of dewarping techniques, curled textlines information based dewarping techniques are the most popular ones, but are sensitive to high degrees of curl and variable line spacing.In this paper we build a novel dewarping approach based on curled textlines information, which has been extracted using ridges based modified active contour model (coupled-snakes). Our dewarping approach is less sensitive different direction of curl and variable line spacing. Experimental results show that OCR error rate, from warped to dewarped documents, has been reduced from 5.15% to 1.92% on the dataset of CBDAR 2007 document image dewarping con-test. We also report the performance of our method in comparison with other state-of-the-art methods.

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz