Publikation

ED2: A Case for Active Learning in Error Detection

Felix Neutatz, Mohammad Mahdavi, Ziawasch Abedjan

In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management. ACM International Conference on Information and Knowledge Management (CIKM-2019) 28th November 3-7 Beijing China Seiten 2249-2252 CIKM '19 ISBN 978-1-4503-6976-3 ACM 2019.

Abstrakt

State-of-the-art approaches formulate error detection as a semi-supervised classification problem. Recent research suggests that active learning is insufficiently effective for error detection and proposes the usage of neural networks and data augmentation to reduce the number of these user-provided labels. However, we can show that using the appropriate active learning strategy, it is possible to outperform the more complex models that rely on data augmentation. To this end, we propose a multi-classifier approach with two-stage sampling for active learning. This intuitive and neat sampling method chooses the most promising cells across rows and columns for labeling. On three datasets, ED2 achieves state-of-the-art detection accuracy while for large datasets, the required number of user labels is lower by one order of magnitude compared to the state of the art.

Projekte

Weitere Links

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence