FADER: Fast Adversarial Example Rejection
Deep neural networks are vulnerable to adversarial examples, i.e., carefully-crafted inputs that mislead classification at test time. Recent defenses have been shown to improve adversarial robustness by detecting anomalous deviations from legitimate training samples at different layer representation...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep neural networks are vulnerable to adversarial examples, i.e.,
carefully-crafted inputs that mislead classification at test time. Recent
defenses have been shown to improve adversarial robustness by detecting
anomalous deviations from legitimate training samples at different layer
representations - a behavior normally exhibited by adversarial attacks. Despite
technical differences, all aforementioned methods share a common backbone
structure that we formalize and highlight in this contribution, as it can help
in identifying promising research directions and drawbacks of existing methods.
The first main contribution of this work is the review of these detection
methods in the form of a unifying framework designed to accommodate both
existing defenses and newer ones to come. In terms of drawbacks, the
overmentioned defenses require comparing input samples against an oversized
number of reference prototypes, possibly at different representation layers,
dramatically worsening the test-time efficiency. Besides, such defenses are
typically based on ensembling classifiers with heuristic methods, rather than
optimizing the whole architecture in an end-to-end manner to better perform
detection. As a second main contribution of this work, we introduce FADER, a
novel technique for speeding up detection-based methods. FADER overcome the
issues above by employing RBF networks as detectors: by fixing the number of
required prototypes, the runtime complexity of adversarial examples detectors
can be controlled. Our experiments outline up to 73x prototypes reduction
compared to analyzed detectors for MNIST dataset and up to 50x for CIFAR10
dataset respectively, without sacrificing classification accuracy on both clean
and adversarial data. |
---|---|
DOI: | 10.48550/arxiv.2010.09119 |