Improving the transferability of adversarial examples with separable positive and negative disturbances

Adversarial examples demonstrate the vulnerability of white-box models but exhibit weak transferability to black-box models. In image processing, each adversarial example usually consists of original image and disturbance. The disturbances are essential for the adversarial examples, determining the...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Neural computing & applications 2024-03, Vol.36 (7), p.3725-3736
Hauptverfasser: Yan, Yuanjie, Bu, Yuxuan, Shen, Furao, Zhao, Jian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Adversarial examples demonstrate the vulnerability of white-box models but exhibit weak transferability to black-box models. In image processing, each adversarial example usually consists of original image and disturbance. The disturbances are essential for the adversarial examples, determining the attack success rate on black-box models. To improve the transferability, we propose a new white-box attack method called separable positive and negative disturbance (SPND). SPND optimizes the positive and negative perturbations instead of the adversarial examples. SPND also smooths the search space by replacing constrained disturbances with unconstrained variables, which improves the success rate of attacking the black-box model. Our method outperforms the other attack methods in the MNIST and CIFAR10 datasets. In the ImageNet dataset, the black-box attack success rate of SPND exceeds the optimal CW method by nearly ten percentage points under the perturbation of L ∞ = 0.3 .
ISSN:0941-0643
1433-3058
DOI:10.1007/s00521-023-09259-5