Detection of backdoor attacks using targeted universal adversarial perturbations for deep neural networks

Backdoor attacks on deep neural networks (DNNs) using targeted universal adversarial perturbations (TUAPs) do not require training datasets and model tampering, and triggers based on TUAPs can make DNNs output any class the adversary wants. Retraining DNNs using adversarial training for security is...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of systems and software 2024-01, Vol.207, p.111859, Article 111859
Hauptverfasser:	Qu, Yubin, Huang, Song, Chen, Xiang, Wang, Xingya, Yao, Yongming
Format:	Artikel
Sprache:	eng
Schlagworte:	Backdoor attack Deep neural networks Software engineering Targeted universal adversarial perturbations
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Backdoor attacks on deep neural networks (DNNs) using targeted universal adversarial perturbations (TUAPs) do not require training datasets and model tampering, and triggers based on TUAPs can make DNNs output any class the adversary wants. Retraining DNNs using adversarial training for security is time-consuming and does not apply to DNNs in runtime. We want to detect backdoors using a black-box testing approach. We observe that after superimposing random noise on the input of a backdoor attack, the output still tends to remain the same, so we propose Sequential Analysis method based on the Metamorphosis Testing (SAMT). We designed two metamorphic relations for test case generation. Using sequential sampling, we calculate the label stability rate (LSR) and infer whether the image to be verified contains a trigger based on the sequential probability ratio change. The experimental results show that our method has a higher backdoor detection success rate (dsr) than the state-of-the-art detection algorithms. Moreover, our method does not need to use the model structure of DNNs, which has more adaptability and generalization ability. Based on our proposed method, we can simply add a backdoor detection layer to detect backdoors as early as possible, which can eventually alleviate the harm of such backdoors. •Backdoor attacks on DNN using targeted universal adversarial perturbations are stealthy.•Sequential analysis and metamorphosis testing for backdoor detection.•The backdoor detection success rate is up to 86.3%.•More generalization ability for detecting targeted universal adversarial perturbations.
ISSN:	0164-1212 1873-1228
DOI:	10.1016/j.jss.2023.111859