Transferable Perturbations of Deep Feature Distributions
Almost all current adversarial attacks of CNN classifiers rely on information derived from the output layer of the network. This work presents a new adversarial attack based on the modeling and exploitation of class-wise and layer-wise deep feature distributions. We achieve state-of-the-art targeted...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Almost all current adversarial attacks of CNN classifiers rely on information
derived from the output layer of the network. This work presents a new
adversarial attack based on the modeling and exploitation of class-wise and
layer-wise deep feature distributions. We achieve state-of-the-art targeted
blackbox transfer-based attack results for undefended ImageNet models. Further,
we place a priority on explainability and interpretability of the attacking
process. Our methodology affords an analysis of how adversarial attacks change
the intermediate feature distributions of CNNs, as well as a measure of
layer-wise and class-wise feature distributional separability/entanglement. We
also conceptualize a transition from task/data-specific to model-specific
features within a CNN architecture that directly impacts the transferability of
adversarial examples. |
---|---|
DOI: | 10.48550/arxiv.2004.12519 |