Boosting transferability of targeted adversarial examples with non-robust feature alignment

Deep learning networks can be deceived by properly processing the noise into the image. As a powerful method, feature space adversarial attack improves the transferability of targeted adversarial examples by modulating the intermediate layer representations of images. However, there is still plenty...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2023-10, Vol.227, p.120248, Article 120248
Hauptverfasser: Zhu, Hegui, Sui, Xiaoyan, Ren, Yuchen, Jia, Yanmeng, Zhang, Libo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Deep learning networks can be deceived by properly processing the noise into the image. As a powerful method, feature space adversarial attack improves the transferability of targeted adversarial examples by modulating the intermediate layer representations of images. However, there is still plenty of room for improvement in transferring targeted adversarial examples, hence we propose an efficient Non-robust Feature Alignment targeted adversarial attack method called NFAA. It is an effective method to generate targeted adversarial examples by simultaneously filtering out features of the original class and adding features of the target class. First, since the DNN is sensitive to high-frequency information in the image, we employ bilateral filtering to filter out the high-frequency information of the original image, which is treated as a non-robust component in our work. Then, we design a weighted Chi-squared distance to align the feature differences in the intermediate layers, where the feature weights help the perturbed image to learn more non-robust information of the target image. In addition, for better aligning with the target example, we utilize the integrated features in the output layer to construct an auxiliary loss. Finally, sufficient quantitative experiments and visualization analysis verify that the targeted adversarial examples generated by NFAA can be effectively transferred to models with various architectures including DN121, VGG19bn, Inc-v3, and RN50. In particular, NFAA also exhibits strong transfer performance for defense settings, online recognition platforms, and object detection models. All these illustrate the effectiveness and performance of the proposed attack method. •An non-robust feature alignment targeted adversarial attack method is established.•The Weighted Chi-squared distance can align feature differences in intermediate layer.•The integrated features are used auxiliary to guide perturbation generation.•Experiments and visualizations show that our method outperforms the SOTA methods.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2023.120248