Multiple Views of Grid Based Outlier Detection for Combine Harvester

Ying Gu

3. Konferenz Machine Learning for Cyberphysical Systems – ML4CPS, 26.-27.10.2017, Lemgo 10/2017.


Outlier detection is one of the most widely used technique to identify unusual behavior in raw data. Though there already exist a lot of outlier detection algorithms, with the development of the Internet of Things in the last ten years, the rapidly evolving sensor and storage technologies imply the need for rapid high volume processing techniques. In the AGATA project, we deal with large data sets collected from combine harvester sensors[GBSM16]. Traditional anomaly detection algorithms face a series of problems: - Most of the distance based methods have second order run-time complexity[Gol14]. - Many methods do not deal with unordered nominal attributes. In practical applications, sensor data very often contain non-numerical attributes. - The clustering based and depth based anomaly detection algorithms need to make assumptions on the statistical distribution of the underlying data set. - Whether a data point is an outlier depends on different aspects of the research. Most of the distance based algorithms deliver one-sided results based on distance between neighboring a points. To conquer these problems, a Grid Based Outlier Detection algorithm is proposed[GGB+17]. This algorithm, also called GBOD, which has a linear complexity and is able to deal with data sets without any distribution assumption on the underlying dataset. This paper will extend this algorithm. In this paper, we will demonstrate 1. how to deal with unordered nominal attributes, 2. different views of outliers: (a) direct view (density based outliers) which provides results similar to the k nearest neighbor based anomaly detection algorithms, (b) grid distribution view which provides outliers based on the grid density histogram, (c) center based view, which provides outliers based on deviation of the center of gravity of the data points from the geometrical center in each grid cell.


Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence