Prediction of Classifier Training Time Including Parameter Optimization
Matthias Reif; Faisal Shafait; Andreas Dengel
In: Joscha Bach; Stefan Edelkamp (Hrsg.). Proceedings of the 34th Annual German Conference on Artificial Intelligence. German Conference on Artificial Intelligence (KI-11), 34th, October 4-7, Berlin, Germany, Pages 260-271, Lecture Notes in Computer Science (LNCS), Vol. 7006, ISBN 978-3-642-24454-4, Springer, Berlin, Heidelberg, 10/2011.
Besides the classification performance, the training time is a second important factor that affects the suitability of a classification algorithm regarding an unknown dataset. An algorithm with a slightly lower accuracy is maybe preferred if its training time is significantly lower. Additionally, an estimation of the required training time of a pattern recognition task is very useful if the result has to be available in a certain amount of time. Meta-learning is often used to predict the suitability or performance of classifiers using different learning schemes and features. Especially landmarking features have been used very successfully in the past. The accuracy of simple learners are used to predict the performance of a more sophisticated algorithm. In this work, we investigate the quantitative prediction of the training time for several target classifiers. Different sets of meta-features are evaluated according to their suitability of predicting actual run-times of a parameter optimization by a grid search. Additionally, we adapted the concept of landmarking to time prediction. Instead of their accuracy, the run-time of simple learners are used as feature values. We evaluated the approach on real world datasets from the UCI machine learning repository and StatLib. The run-time of five different classification algorithms are predicted and evaluated using two different performance measures. The promising results show that the approach is able to reasonably predict the training time including a parameter optimization. Furthermore, different sets of meta-features seem to be necessary for different target algorithms in order to achieve the highest prediction performances.