A Textual Backdoor Defense Method Based on Deep Feature Classification

Natural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method include...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Entropy (Basel, Switzerland) Switzerland), 2023-01, Vol.25 (2), p.220
Hauptverfasser:	Shao, Kun, Yang, Junan, Hu, Pengjiang, Li, Xiaoshuai
Format:	Artikel
Sprache:	eng
Schlagworte:	adversarial machine learning Artificial neural networks backdoor attacks backdoor defenses Classification Computational linguistics Datasets deep neural networks Defense Effectiveness Feature extraction Language processing Methods Natural language interfaces Natural language processing Neural networks Optimization Technology application Text processing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Natural language processing (NLP) models based on deep neural networks (DNNs) are vulnerable to backdoor attacks. Existing backdoor defense methods have limited effectiveness and coverage scenarios. We propose a textual backdoor defense method based on deep feature classification. The method includes deep feature extraction and classifier construction. The method exploits the distinguishability of deep features of poisoned data and benign data. Backdoor defense is implemented in both offline and online scenarios. We conducted defense experiments on two datasets and two models for a variety of backdoor attacks. The experimental results demonstrate the effectiveness of this defense approach and outperform the baseline defense method.
ISSN:	1099-4300 1099-4300
DOI:	10.3390/e25020220