Boosting Adversarial Attacks by Leveraging Decision Boundary Information
Due to the gap between a substitute model and a victim model, the gradient-based noise generated from a substitute model may have low transferability for a victim model since their gradients are different. Inspired by the fact that the decision boundaries of different models do not differ much, we c...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Due to the gap between a substitute model and a victim model, the
gradient-based noise generated from a substitute model may have low
transferability for a victim model since their gradients are different.
Inspired by the fact that the decision boundaries of different models do not
differ much, we conduct experiments and discover that the gradients of
different models are more similar on the decision boundary than in the original
position. Moreover, since the decision boundary in the vicinity of an input
image is flat along most directions, we conjecture that the boundary gradients
can help find an effective direction to cross the decision boundary of the
victim models. Based on it, we propose a Boundary Fitting Attack to improve
transferability. Specifically, we introduce a method to obtain a set of
boundary points and leverage the gradient information of these points to update
the adversarial examples. Notably, our method can be combined with existing
gradient-based methods. Extensive experiments prove the effectiveness of our
method, i.e., improving the success rate by 5.6% against normally trained CNNs
and 14.9% against defense CNNs on average compared to state-of-the-art
transfer-based attacks. Further we compare transformers with CNNs, the results
indicate that transformers are more robust than CNNs. However, our method still
outperforms existing methods when attacking transformers. Specifically, when
using CNNs as substitute models, our method obtains an average attack success
rate of 58.2%, which is 10.8% higher than other state-of-the-art transfer-based
attacks. |
---|---|
DOI: | 10.48550/arxiv.2303.05719 |