Improvements to adversarial training for text classification tasks

Although deep learning models show powerful performance, they are still easily deceived by adversarial samples. Some methods for generating adversarial samples have the drawback of high time loss, which is problematic for adversarial training, and the existing adversarial training methods are diffic...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of intelligent & fuzzy systems 2024-02, Vol.46 (2), p.5191-5202
Hauptverfasser: he, Jia-long, zhang, Xiao-Lin, wang, Yong-Ping, gu, Rui-Chun, liu, Li-xin, xu, En-Hui
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Although deep learning models show powerful performance, they are still easily deceived by adversarial samples. Some methods for generating adversarial samples have the drawback of high time loss, which is problematic for adversarial training, and the existing adversarial training methods are difficult to adapt to the dynamic nature of the model, so it is still challenging to study an efficient adversarial training method. In this paper, we propose an adversarial training method, the core of which is the improved adversarial sample generation method AGFAT for adversarial training and the improved dynamic adversarial training method AGFAT-DAT. AGFAT uses a word frequency-based approach to identify significant words, filter replacement candidates, and use an efficient semantic constraint module as a means to reduce the time of adversarial sample generation; AGFAT-DAT is a dynamic adversarial training approach that uses a cyclic attack on the model after adversarial training and generates adversarial samples for adversarial training again. It is demonstrated that the proposed method can significantly reduce the generation time of adversarial samples, and the adversarial-trained model can also effectively defend against other types of word-level adversarial attacks.
ISSN:1064-1246
1875-8967
DOI:10.3233/JIFS-234034