Enhancing Adversarial Example Transferability with an Intermediate Level Attack
Neural networks are vulnerable to adversarial examples, malicious inputs crafted to fool trained models. Adversarial examples often exhibit black-box transfer, meaning that adversarial examples for one model can fool another model. However, adversarial examples are typically overfit to exploit the p...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Neural networks are vulnerable to adversarial examples, malicious inputs
crafted to fool trained models. Adversarial examples often exhibit black-box
transfer, meaning that adversarial examples for one model can fool another
model. However, adversarial examples are typically overfit to exploit the
particular architecture and feature representation of a source model, resulting
in sub-optimal black-box transfer attacks to other target models. We introduce
the Intermediate Level Attack (ILA), which attempts to fine-tune an existing
adversarial example for greater black-box transferability by increasing its
perturbation on a pre-specified layer of the source model, improving upon
state-of-the-art methods. We show that we can select a layer of the source
model to perturb without any knowledge of the target models while achieving
high transferability. Additionally, we provide some explanatory insights
regarding our method and the effect of optimizing for adversarial examples
using intermediate feature maps. Our code is available at
https://github.com/CUVL/Intermediate-Level-Attack. |
---|---|
DOI: | 10.48550/arxiv.1907.10823 |