The OCRopus Open Source OCR System

Thomas Breuel

In: B.A. Yanikoglou , K. Berkner (Hrsg.). Proceedingsof the Document and Retrival XV, IS&T/SPIE 20th Annual Symposium 2008. SPIE Conference on Document Recognition and Retrieval (DRR-2008) January 16-31 San Jose CA United States 6815 SPIE 2008.


OCRopus is a new, open source OCR system emphasizing modularity, easy extensibility, and reuse, aimed at both the research community and large scale commercial document conversions. This paper describes the current status of the system, its general architecture, as well as the major algorithms currently being used for layout analysis and text line recognition.

