Comparison of Data Selection Strategies For Online Support Vector Machine Classification

Mario Michael Krell, Nils Wilshusen, Andrei Cristian Ignat, Su-Kyoung Kim

In: Proceedings of the International Congress on Neurotechnology, Electronics and Informatics ( International Congress on Neurotechnology, Electronics and Informatics (NEUROTECHNIX-2015) November 16-17 Lissabon Portugal Pages 59-67 SciTePress 11/2015.


It is often the case that practical applications of support vector machines (SVMs) require the capability to perform online learning under limited availability of computational resources. Enabling SVMs for online learning can be done through several strategies. One group thereof manipulates the training data and limits its size. We aim to summarize these existing approaches and compare them, firstly, on several synthetic datasets with different shifts and, secondly, on electroencephalographic (EEG) data. During the manipulation, class imbalance can occur across the training data and it might even happen that all samples of one class are removed. In order to deal with this potential issue, we suggest and compare three balancing criteria. Results show, that there is a complex interaction between the different groups of selection criteria, which can be combined arbitrarily. For different data shifts, different criteria are appropriate. Adding all samples to the pool of considered samples performs usually significantly worse than other criteria. Balancing the data is helpful for EEG data. For the synthetic data, balancing criteria were mostly relevant when the other criteria were not well chosen.


Weitere Links

data_handling.pdf (pdf, 0 B)

German Research Center for Artificial Intelligence
Deutsches Forschungszentrum für Künstliche Intelligenz