Rethinking Semantic Segmentation for Table Structure Recognition in Documents

Muhammad Shoaib Ahmed Siddiqui, Pervaiz Iqbal Khan, Andreas Dengel, Sheraz Ahmed

In: Proceedings ICDAR'19. International Conference on Document Analysis and Recognition (ICDAR-2019) September 20-25 Sydney Australia Pages 1397-1402 ISBN 978-1-7281-3015-6 IEEE 2019.


Based on the recent advancements in the domain of semantic segmentation, Fully-Convolutional Networks (FCN) have been successfully applied for the task of table structure recognition in the past. We analyze the efficacy of semantic segmentation networks for this purpose and simplify the problem by proposing prediction tiling based on the consistency assumption which holds for tabular structures. For an image of dimensions H × W, we predict a single column for the rows (ŷ row ϵ H) and a predict a single row for the columns (ŷ row ϵ W). We use a dual-headed architecture where initial feature maps (from the encoder-decoder model) are shared while the last two layers generate class specific (row/column) predictions. This allows us to generate predictions using a single model for both rows and columns simultaneously, where previous methods relied on two separate models for inference. With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13 image-based table structure recognition dataset with an average F-Measure of 92.39% (91.90% and 92.88% F-Measure for rows and columns respectively). With the proposed method, we were able to achieve state-of-the-art results on ICDAR-13. The obtained results advocate that constraining the problem space in the case of FCN by imposing valid constraints can lead to significant performance gains.

Weitere Links

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz