Wicked Oddities: Selectively Poisoning for Effective Clean-Label Backdoor Attacks
Deep neural networks are vulnerable to backdoor attacks, a type of adversarial attack that poisons the training data to manipulate the behavior of models trained on such data. Clean-label attacks are a more stealthy form of backdoor attacks that can perform the attack without changing the labels of...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep neural networks are vulnerable to backdoor attacks, a type of
adversarial attack that poisons the training data to manipulate the behavior of
models trained on such data. Clean-label attacks are a more stealthy form of
backdoor attacks that can perform the attack without changing the labels of
poisoned data. Early works on clean-label attacks added triggers to a random
subset of the training set, ignoring the fact that samples contribute unequally
to the attack's success. This results in high poisoning rates and low attack
success rates. To alleviate the problem, several supervised learning-based
sample selection strategies have been proposed. However, these methods assume
access to the entire labeled training set and require training, which is
expensive and may not always be practical. This work studies a new and more
practical (but also more challenging) threat model where the attacker only
provides data for the target class (e.g., in face recognition systems) and has
no knowledge of the victim model or any other classes in the training set. We
study different strategies for selectively poisoning a small set of training
samples in the target class to boost the attack success rate in this setting.
Our threat model poses a serious threat in training machine learning models
with third-party datasets, since the attack can be performed effectively with
limited information. Experiments on benchmark datasets illustrate the
effectiveness of our strategies in improving clean-label backdoor attacks. |
---|---|
DOI: | 10.48550/arxiv.2407.10825 |