3. Konferenz Machine Learning for Cyberphysical Systems – ML4CPS, 26.-27.10.2017, Lemgo 10/2017.
Outlier detection is one of the most widely used technique to identify unusual behavior in raw
data. Though there already exist a lot of outlier detection algorithms, with the development of
the Internet of Things in the last ten years, the rapidly evolving sensor and storage technologies
imply the need for rapid high volume processing techniques. In the AGATA project, we deal with
large data sets collected from combine harvester sensors[GBSM16]. Traditional anomaly detection
algorithms face a series of problems:
- Most of the distance based methods have second order run-time complexity[Gol14].
- Many methods do not deal with unordered nominal attributes. In practical applications,
sensor data very often contain non-numerical attributes.
- The clustering based and depth based anomaly detection algorithms need to make assumptions
on the statistical distribution of the underlying data set.
- Whether a data point is an outlier depends on different aspects of the research. Most of the
distance based algorithms deliver one-sided results based on distance between neighboring a
To conquer these problems, a Grid Based Outlier Detection algorithm is proposed[GGB+17].
This algorithm, also called GBOD, which has a linear complexity and is able to deal with data
sets without any distribution assumption on the underlying dataset. This paper will extend this
algorithm. In this paper, we will demonstrate
1. how to deal with unordered nominal attributes,
2. different views of outliers:
(a) direct view (density based outliers) which provides results similar to the k nearest neighbor
based anomaly detection algorithms,
(b) grid distribution view which provides outliers based on the grid density histogram,
(c) center based view, which provides outliers based on deviation of the center of gravity of
the data points from the geometrical center in each grid cell.