Improving the transferability of adversarial examples with separable positive and negative disturbances
Adversarial examples demonstrate the vulnerability of white-box models but exhibit weak transferability to black-box models. In image processing, each adversarial example usually consists of original image and disturbance. The disturbances are essential for the adversarial examples, determining the...
Gespeichert in:
Veröffentlicht in: | Neural computing & applications 2024-03, Vol.36 (7), p.3725-3736 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Adversarial examples demonstrate the vulnerability of white-box models but exhibit weak transferability to black-box models. In image processing, each adversarial example usually consists of original image and disturbance. The disturbances are essential for the adversarial examples, determining the attack success rate on black-box models. To improve the transferability, we propose a new white-box attack method called separable positive and negative disturbance (SPND). SPND optimizes the positive and negative perturbations instead of the adversarial examples. SPND also smooths the search space by replacing constrained disturbances with unconstrained variables, which improves the success rate of attacking the black-box model. Our method outperforms the other attack methods in the MNIST and CIFAR10 datasets. In the ImageNet dataset, the black-box attack success rate of SPND exceeds the optimal CW method by nearly ten percentage points under the perturbation of
L
∞
=
0.3
. |
---|---|
ISSN: | 0941-0643 1433-3058 |
DOI: | 10.1007/s00521-023-09259-5 |