Automatic Word Ground Truth Generation for Camera Captured Documents

Sheraz Ahmed, Koichi Kise, Masakazu Iwamura, Marcus Liwicki, Andreas Dengel

In: Technical Committee on Pattern Recognition and Media Understanding. Pattern Recognition and Media Understanding (PRMU-2013) befindet sich CEATEC 2013 October 3-4 Makuhari Messe, Chiba Japan IEICE 2013.


A database for camera captured documents is useful to train OCRs to obtain better performance. However, no dataset exists for camera captured documents because it is very laborious and costly to build these datasets manually. In this paper, a fully automatic approach allowing building the very large scale (i.e., millions of images) labeled camera captured documents dataset is proposed. The proposed approach does not require any human intervention in labeling. Evaluation of samples generated by the proposed approach shows that more than 97% of the images are correctly labeled. Novelty of the proposed approach lies in the use of document image retrieval for automatic labeling, especially for camera captured documents, which contain different distortions specific to camera, e.g., blur, perspective distortion, etc.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence