Publikation

Adversarial Defense based on Structure-to-Signal Autoencoders

Joachim Folz, Sebastian Palacio, Jörn Hees, Andreas Dengel

In: WACV. IEEE Winter Conference on Applications of Computer Vision (WACV-2020) March 2-5 Snowmass Village Colorado United States IEEE 2020.

Abstrakt

Adversarial attacks have exposed the intricacies of the complex loss surfaces approximated by neural networks. Mitigating the effects of said attacks has proven non-trivial but has also raised the need for more interpretable models. In this paper, we present a defense strategy against gradient-based attacks, on the premise that input gradients need to expose information about the semantic manifold for attacks to be successful. We propose an architecture based on compressive autoencoders (AEs) with a two-stage training scheme, creating not only an architectural bottleneck but also a representational bottleneck. We show that the proposed mechanism yields robust results against a collection of gradient-based attacks under challenging white-box conditions. This defense is attack-agnostic and can, therefore, be used for arbitrary pre-trained models, while not compromising the original performance. These claims are supported by experiments conducted with state-of-the-art image classifiers (ResNet50 and Inception v3), on the full ImageNet validation set. Experiments, including counterfactual analysis, empirically show that the robustness stems from a shift in the distribution of input gradients, which mitigates the effect of tested adversarial attack methods. Gradients propagated through the proposed AEs represent less semantic information and instead point to low-level structural features.

s2s_wacv2019.pdf (pdf, 1 MB)

Deutsches Forschungszentrum für Künstliche Intelligenz
German Research Center for Artificial Intelligence