Automatic Transcription of Handwritten Medieval Documents

Andreas Fischer; Markus Wuthrich; Marcus Liwicki; Volkmar Frinken; Horst Bunke; Gabriel Viehhauser; Michael Stolz
In: Proceedings of the 15th International Conference on Virtual Systems and MultiMedia. International Conference on Virtual Systems and MultiMedia (VSMM-2009), September 9-12, Vienna, Austria, Pages 137-142, ISBN 978-0-7695-3790-0, IEEE Computer Society, 2009.


The automatic transcription of historical documents is vital for the creation of digital libraries. In order to make images of valuable old documents amenable to browsing, a transcription of high accuracy is needed. In this paper, two state-of-the art recognizers originally developed for modern scripts are applied to medieval documents. The first is based on Hidden Markov Models and the second uses a Neural Network with a bidirectional Long Short-Term Memory. On a dataset of word images extracted from a medieval manuscript of the 13th century, written in Middle High German by several writers, it is demonstrated that a word accuracy of 93.32% is achievable. This is far above the word accuracy of 77.12% achieved with the same recognizers for unconstrained modern scripts written in English. These results encourage the development of real world systems for automatic transcription of historical documents with a view to image and text browsing in digital libraries.



Weitere Links