Two Stream Deep Network for Document Image Classification

Muhammad Nabeel Asim, Muhammad Usman Ghani Khan, Muhammad Imran Malik, Khizar Razzaque, Andreas Dengel, Sheraz Ahmed

In: ICDAR. International Conference on Document Analysis and Recognition (ICDAR-2019) September 20-25 SYDNEY Australia IEEE 9/2019.


This paper presents a novel two-stream approach for document image classification. The proposed approach leverages textual and visual modalities to classify document images into ten categories, including letter, memo, news article, etc. In order to alleviate dependency of textual stream on performance of underlying OCR (which is the case with general content based document image classifiers), we utilize a filter based feature-ranking algorithm. This algorithm ranks the features of each class based on their ability to discriminate document images and selects a set of top 'K' features that are retained for further processing. In parallel, the visual stream uses deep CNN models to extract structural features of document images. Finally, textual and visual streams are concatenated together using an average ensembling method. Experimental results reveal that the proposed approach outperforms the state-of-the-art system with a significant margin of 4.5\% on publicly available Tobacco-3482 dataset.

Asim_Document_Image_Classification.pdf (pdf, 6 MB)

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz