Skip to main content Skip to main navigation

Publication

Cross-Domain Transformation for Outlier Detection on Tabular Datasets

Dayananda Herurkar; Timur Sattarov; Jörn Hees; Sebastian Palacio; Federico Raue; Andreas Dengel
In: IJCNN 2023 : International Joint Conference on Neural Networks. International Joint Conference on Neural Networks (IJCNN-2023), located at IJCNN, June 18-23, Gold Coast Convention and Exhibition Centre, Queensland, Australia, DFKI Research Reports (RR), IEEE Xplore, 2023.

Abstract

The overwhelming success of Deep Learning approaches in recent years is often driven by the availability of large public datasets. However, in some domains like finance, creating and sharing realistic datasets is hindered by secrecy or privacy concerns. This can lead to a mismatch, where approaches that have proven to work well on public, research-oriented datasets end up underperforming when applied to real-world (private) datasets. In this work, we focus on the task of Outlier Detection (OD) and bridge the above gap by building an autoencoder based Deep Learning approach that can transform samples between two tabular datasets (e.g., a private and public one). The goal of our approach is that transformed samples become similar to the target dataset, while inliers remain inliers and outliers remain outliers. Among others, after successful transformation, this allows applying of proven methods on public datasets to internal datasets, even if they are of different dimensionality (rows and columns). To evaluate our approach, we introduce metrics to measure dataset similarity and the quality of transformed samples. Our experimental results show that combining public datasets with transformed samples of other datasets leads to higher dataset similarity while sustaining performance w.r.t. common OD algorithms.

Projekte