Nearest-Neighbor and Clustering based Anomaly Detection Algorithms for RapidMiner

Mennatallah Amer; Markus Goldstein
In: Simon Fischer; Ingo Mierswa (Hrsg.). Proceedings of the 3rd RapidMiner Community Meeting and Conferernce (RCOMM 2012). RapidMiner Community Meeting and Conference (RCOMM-2012), August 28-31, Budapest, Hungary, Pages 1-12, ISBN 978-3-8440-0995-8, Shaker Verlag GmbH, Aachen, 8/2012.


Unsupervised anomaly detection is the process of finding outlying records in a given dataset without prior need for training. In this paper we introduce an anomaly detection extension for RapidMiner in order to assist non-experts with applying eight different nearest-neighbor and clustering based algorithms on their data. A focus on efficient implementation and smart parallelization guarantees its practical applicability. In the context of clustering-based anomaly detection, two new algorithms are introduced: First, a global variant of the cluster-based local outlier factor (CBLOF) is introduced which tries to compensate the shortcomings of the original method. Second, the local density cluster-based outlier factor (LDCOF) is introduced which takes the local variances of clusters into account. The performance of all algorithms have been evaluated on real world datasets from the UCI machine learning repository. The results reveal the strengths and weaknesses of the single algorithms and show that our proposed clustering based algorithms outperform CBLOF significantly.

Weitere Links