Projan: A probabilistic trojan attack on deep neural networks

Deep neural networks have gained popularity due to their outstanding performance across various domains. However, because of their lack of explainability, they are vulnerable to some kinds of threats including the trojan or backdoor attack, in which an adversary can train the model to respond to a c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2024-11, Vol.304, p.112565, Article 112565
Hauptverfasser:	Saremi, Mehrin, Khalooei, Mohammad, Rastgoo, Razieh, Sabokrou, Mohammad
Format:	Artikel
Sprache:	eng
Schlagworte:	AI security Backdoor attack Deep learning Probabilistic trojan attack Trojan attack
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Deep neural networks have gained popularity due to their outstanding performance across various domains. However, because of their lack of explainability, they are vulnerable to some kinds of threats including the trojan or backdoor attack, in which an adversary can train the model to respond to a crafted peculiar input pattern (also called trigger) according to their will. Several trojan attack and defense methods have been proposed in the literature. Many of the defense methods are based on the assumption that the possibly existing trigger must be able to affect the model’s behavior, making it output a certain class label for all inputs. In this work, we propose an alternative attack method that violates this assumption. Instead of a single trigger that works on all inputs, a few triggers are generated that will affect only some of the inputs. At attack time, the adversary will need to try more than one trigger to succeed, which might be possible in some real-world situations. Our experiments on MNIST and CIFAR-10 datasets show that such an attack can be implemented successfully, reaching an attack success rate similar to baseline methods called BadNet and N-to-One. We also tested wide range of defense methods and verified that in general, this kind of backdoor is more difficult for defense algorithms to detect. The code is available at https://github.com/programehr/Projan. [Display omitted] •A trojan attack on neural networks is presented that uses more than one trigger.•The attacker might need to try a few triggers before succeeding.•While attack success rate of each trigger is low, total attack success rate is high.•A low per-trigger attack success rate makes the attack more stealthy.
ISSN:	0950-7051
DOI:	10.1016/j.knosys.2024.112565