Publication
Towards Extending XAI for Full Data Science Pipelines
Nadja Geisler; Carsten Binnig
In: Jean-Daniel Fekete; Behrooz Omidvar-Tehrani; Kexin Rong; Roee Shraga (Hrsg.). Proceedings of the 2024 Workshop on Human-In-the-Loop Data Analytics, HILDA 24, Santiago, Chile, 14 June 2024. Workshop on Human-In-the-Loop Data Analytics (HILDA), Pages 1-7, ACM, 2024.
Abstract
Data preprocessing and engineering are essential parts of
any AI system, as indicated by the current trend of data-
centric AI. However, until now, explainability efforts have
almost exclusively focused on models. We propose explana-
tions for preprocessing pipelines that express the impact of
each step on the resulting model behavior based on existing
feature attribution methods. In the process, we introduce
two related but distinct measures of impact for preprocess-
ing steps: Leave-out Impact (What do we lose/gain by leaving
out this step?) and Immediate Impact (What do we lose/gain
by adding this step at this time?). Both are obtained by con-
structing variations of the original pipeline and comparing
the resulting model behavior represented as feature impor-
tance vectors. These measures reflect the intuition of impact
but also express the effects of a step and its interactions
with the rest of the pipeline on the internal workings of the
trained model.
