On General Purpose Time Series Similarity Measures and Their Use as Kernel Functions in Support Vector Machines

Helmuth Pree, Benjamin Herwig, Thiemo Gruber, Bernhard Sick, Klaus David, Paul Lukowicz

In: Information Sciences 281 Seiten 478-495 Elsevier 10/2014.


The article addresses the problem of temporal data mining, in particular classification, with support vector machines (SVM). If no application-specific knowledge about the nature of the time series is available, general purpose time series similarity measures can be used as kernel functions in SVM. The article compares several possible similarity measures, namely the linear Euclidean, triangle, polynomial probabilistic (with two variants), and shape space distances (SSD), as well as the nonlinear measures dynamic time warping (DTW), longest common subsequences, and time warp edit distance (TWED). Nonlinear (i.e., “elastic”) measures take a nonlinear scaling of the time series in the time domain into account. First, these measures are used in combination with a nearest neighbor classifier, then the various similarity measures are taken to compute the kernel matrices for SVM. Simulation experiments with twenty publicly available benchmark data sets show, that with regard to classification accuracy, TWED performs very well over all measures, while SSD is the best linear measure. SSD has the lowest run-times, the fastest nonlinear measure is DTW. These claims are further investigated by applying statistical tests. With the results presented in this article and results from related investigations that are considered as well, we want to support practitioners or scholars in answering the following question: Which measure should be looked at first if accuracy is the most important criterion, if an application is time-critical, or if a compromise is needed?

Weitere Links

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence