Tracing the Origin of Adversarial Attack for Forensic Investigation and Deterrence
Deep neural networks are vulnerable to adversarial attacks. In this paper, we take the role of investigators who want to trace the attack and identify the source, that is, the particular model which the adversarial examples are generated from. Techniques derived would aid forensic investigation of a...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep neural networks are vulnerable to adversarial attacks. In this paper, we
take the role of investigators who want to trace the attack and identify the
source, that is, the particular model which the adversarial examples are
generated from. Techniques derived would aid forensic investigation of attack
incidents and serve as deterrence to potential attacks. We consider the
buyers-seller setting where a machine learning model is to be distributed to
various buyers and each buyer receives a slightly different copy with same
functionality. A malicious buyer generates adversarial examples from a
particular copy $\mathcal{M}_i$ and uses them to attack other copies. From
these adversarial examples, the investigator wants to identify the source
$\mathcal{M}_i$. To address this problem, we propose a two-stage
separate-and-trace framework. The model separation stage generates multiple
copies of a model for a same classification task. This process injects unique
characteristics into each copy so that adversarial examples generated have
distinct and traceable features. We give a parallel structure which embeds a
``tracer'' in each copy, and a noise-sensitive training loss to achieve this
goal. The tracing stage takes in adversarial examples and a few candidate
models, and identifies the likely source. Based on the unique features induced
by the noise-sensitive loss function, we could effectively trace the potential
adversarial copy by considering the output logits from each tracer. Empirical
results show that it is possible to trace the origin of the adversarial example
and the mechanism can be applied to a wide range of architectures and datasets. |
---|---|
DOI: | 10.48550/arxiv.2301.01218 |