Publikation

High Level Document Analysis Guided by Geometric Aspects

Andreas Dengel, G. Barth

In: International Journal on Pattern Recognition and Artificial Intelligence (IJPRAI) 2 4 Seiten 641-656 World Scientific Publishing 12/1988.

Abstrakt

The realization of the paper-free office seems to be difficult that expected. Therefore, good paper-computer interfaces are necessary to transform paper documents into an electronic form, which allows the use of a filing and retrieval system. An electronic document page is an optically scanned and digitized representation of a printed page. Document analysis is the problem of interpreting and labeling the constitutents of the document. Although there are very reliable optical character recognition (OCR) methods, the process could be very inefficient. To prune the search space and to become more efficient, some search supporting methods have to be developed. This article proposes an approach to identify the layout of a document page by dividing it recursively into nested rectangular areas. The procedure is used as a basis for a document layout model, which is able to control an automatic interpretation mechanism for deriving a high level representation of the contents of a document. We have implemented our method in Common Lisp on a Symbolies 3640 Workstation and have run it for a large population of office documents. The results obtained have been very encouraging and have convincingly confirmed the soundness of our approach.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence