Interpretable explanations of black box classifiers applied on medical images by meaningful perturbations using variational autoencoders

Hristina Uzunova, Jan Ehrhardt, Timo Kepp, Heinz Handels

In: Elsa D. Angelini , Bennett A. Landman (Hrsg.). Medical Imaging 2019: Image Processing. SPIE Medical Imaging February 16-21 San Diego CA United States Seiten 264-271 10949 SPIE 2019.


The growing popularity of black box machine learning methods for medical image analysis makes their interpretability to a crucial task. To make a system, e.g. a trained neural network, trustworthy for a clinician, it needs to be able to explain its decisions and predictions. In this work, we tackle the problem of generating plausible explanations for the predictions of medical image classifiers, that are trained to differentiate between different types of pathologies and healthy tissues. An intuitive solution to determine which image regions influence the trained classifier is to find out whether the classification results change when those regions are deleted. This idea can be formulated as a minimization problem and thus efficiently implemented. However, the meaning of “deletion” of image regions, in our case pathologies in medical images, is not defined. We contribute by defining the deletion of pathologies, as the replacement by their healthy looking equivalent generated using variational autoencoders. The experiments with a classification neural network on OCT (Optical Coherence Tomography) images and brain lesion MRIs show that a meaningful replacement of “deleted” image regions has significant impact on the reliability of the generated explanations. The proposed deletion method is proven to be successful since our approach delivers the best results compared to four other established methods.

Weitere Links

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence