On the Evaluation of Document Analysis Components by Recall, Precision, and Accuracy

Markus Junker, Rainer Hoch, Andreas Dengel

In: Proceedings of the 5th International Conference on Document Analysis and Recognition. International Conference on Document Analysis and Recognition (ICDAR-99) September 20-22 Bangalore India Seiten 713-716 ISBN 0-7695-0318-7 IEEE Computer Society Washington, DC, USA 1999.


In document analysis, it is common to prove the usefulness of a component by an experimental evaluation. By applying the respective algorithms to a test sample, effectiveness measures such as recall, precision, and accuracy are computed.The goal of such an evaluation is two-fold: on the one hand it shows that the absolute effectiveness of the algorithm is acceptable for practical use. On the other hand, the evaluation can prove that the algorithm has a better or worse effectiveness than another algorithm.In this paper we argue that the experimental evaluation on relative small test sets - as is very common in document analysis - has to be taken with extreme care from a statistical point of view. In fact, it is surprising how weak statements derived from such evaluations are.

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence