Publikation

Visual Appearance based Document ClassificationMethods: Performance Evaluation and Benchmarking

Syed Saqib Bukhari, Andreas Dengel

In: 13th International Conference on Document Analysis and Recognition. International Conference on Document Analysis and Recognition (ICDAR-2015) August 23-26 Nancy France IEEE 2015.

Abstrakt

Most of the traditional document image classification techniques concentrate on document segmentation and OCR analysis, in spite of so many complexities and limitations involved. Recently, many of the document image classification problems are easily solved just by adapting standard computer vision approaches for natural image retrieval and classification, that are referred as visual appearance based document classification techniques. These approaches have reported better results as compared to the traditional approaches on proprietary datasets. However, so far these approaches are not compared with each other and, despite having potential, they are not evaluated on distorted camera-captured documents, which is one of the challenging requirements in our present commercial document analysis projects. In this paper, we present simple and effective descriptions of different visual appearance based document image classification techniques. We compare their performance on various standard and publicly available datasets, that are differ in degree of image degradations and content variations. We also demonstrate their advantages and limitations. Additionally, we make the implemented versions of these method publicly available to research community for usage and further testing on other domains.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence