OCR Error Correction: State-of-the-art vs An NMT Based Approach

Kareem Mokhtar; Syed Saqib Bukhari; Andreas Dengel

In: DAS. IAPR International Workshop on Document Analysis Systems (DAS-2018), April 24-27, Vienna, Austria, IEEE, 2018.


Although the performance of the state-of-the-art OCR systems is very high, they can still introduce errors due to various reasons. and When it comes to historical documents with old manusrips the preformance of such systems gets even worse. That is why Post-OCR error correction has been an open problem for many years. Many state-of-the-art approaches have been introduced thorough the recent years. This paper contributes to the field of Post-OCR Error Cor- rection by introducing two Novel deep learning approaches to improve the accuracy of OCR systems, and a post processing technique that can further enhance the quality of the output results. These approaches are based on Neural Machine Transla- tion and were motivated by the great success that deep learning introduced to the field of Natural Language Processing. Finally, we will compare the state-of-the-art approaches in Post-OCR Error Correction with the newly introduced systems and discuss the results.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence