Skip to main content Skip to main navigation


Testing and Benchmarking Large-Scale Machine Learning Systems

Thomas Breuel
In: Proceedings of the Snowbird Learning Workshop 2007. Snowbird Workshop on Learning, March 19-22, Puerto Rico, USA, Snowbird, 2007.


Our research lab is currently developing a number of large-scale pattern recognitions systems incorporating novel machine learning algorithms, including an adaptive OCR engine for the Google Book Search project and a real-time network monitoring analysis system for a a large telecom provider. We have found existing toolboxes (e.g., R, SPIDER) to be inadequate to support the data management benchmarking, model selection, and validation necessary for the development of such large scale systems. In addition, we find that benchmarking results reported in the literature frequently lack sound control experiments and important text cases.