The "Beatrix'' Resurrections: Robust Backdoor Detection via Gram Matrices
Deep Neural Networks (DNNs) are susceptible to backdoor attacks during training. The model corrupted in this way functions normally, but when triggered by certain patterns in the input, produces a predefined target label. Existing defenses usually rely on the assumption of the universal backdoor set...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep Neural Networks (DNNs) are susceptible to backdoor attacks during
training. The model corrupted in this way functions normally, but when
triggered by certain patterns in the input, produces a predefined target label.
Existing defenses usually rely on the assumption of the universal backdoor
setting in which poisoned samples share the same uniform trigger. However,
recent advanced backdoor attacks show that this assumption is no longer valid
in dynamic backdoors where the triggers vary from input to input, thereby
defeating the existing defenses.
In this work, we propose a novel technique, Beatrix (backdoor detection via
Gram matrix). Beatrix utilizes Gram matrix to capture not only the feature
correlations but also the appropriately high-order information of the
representations. By learning class-conditional statistics from activation
patterns of normal samples, Beatrix can identify poisoned samples by capturing
the anomalies in activation patterns. To further improve the performance in
identifying target labels, Beatrix leverages kernel-based testing without
making any prior assumptions on representation distribution. We demonstrate the
effectiveness of our method through extensive evaluation and comparison with
state-of-the-art defensive techniques. The experimental results show that our
approach achieves an F1 score of 91.1% in detecting dynamic backdoors, while
the state of the art can only reach 36.9%. |
---|---|
DOI: | 10.48550/arxiv.2209.11715 |