Skip to main content Skip to main navigation

Publication

Page Frame Detection for Marginal Noise Removal from Scanned Documents

Faisal Shafait; Joost van Beusekom; Daniel Keysers; Thomas Breuel
In: Proceedings of 15th SCIA 2007. Scandinavian Conference on Image Analysis (SCIA-2007), 15th, June 10-14, Aalborg, Denmark, Pages 651-660, Lecture Notes in Computer Science (LNCS), Vol. 4522 / 2007, Springer, 6/2007.

Abstract

We describe and evaluate a method to robustly detect the page frame in document images, locating the actual page contents area and removing textual and non-textual noise along the page borders. We use a geometric matching algorithm to find the optimal page frame, which has the advantages of not assuming the existence of whitespace between noisy borders and actual page contents, and of giving a practical solution to the page frame detection problem without the need for parameter tuning. We define suitable performance measures and evaluate the algorithm on the UW-III database. The results show that the error rates are below 4% for each of the performance measures used. In addition, we demonstrate that the use of page frame detection reduces the OCR error rate by removing textual noise. Experiments using a commercial OCR system show that the error rate due to elements outside the page frame is reduced from 4.3% to 1.7% on the UW-III dataset.

Weitere Links