anyAlign: An Intelligent and Interac- tive Text-Alignment Web-Application for Historical Document

Syed Saqib Bukhari, Manabendra Saha, Praveen Kumar Badimala Giridhara, Manesh Kumar Lohano, Andreas Dengel

In: The 13th IAPR Workshop on Document Analysis Systems, DAS18, Vienna Austria, 2018.. IAPR International Workshop on Document Analysis Systems (DAS-2018) April 24-27 Vienna Austria IEEE 2018.


Text alignment is an important performance determining step of a OCR system for printed and historical documents. With the increase in transcripts it becomes important to align the text with that of the transcripts. It is a time and labor intensive work for many paleographers. Here we have presented an end to end semi automatic interactive text alignment system for historical document. OCRopus [14] is used for binarization and line segmentation of the historical document image. Text line segmentation followed by text alignment is done automatically by the system using ORB (Oriented Fast and Rotated Brief)local image feature descriptors. ORB features are matched by KNN. The system provides an interactive user interface for rectifying wrong text segmentation and text alignment. The results are discussed in evaluation section.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence