Automatic Generation of Adversarial Readable Chinese Texts

Natural language processing (NLP) models are known vulnerable to adversarial examples, similar to image processing models. Studying adversarial texts is an essential step to improve the robustness of NLP models. However, existing studies mainly focus on generating adversarial texts for English, with...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on dependable and secure computing 2023-03, Vol.20 (2), p.1756-1770
Hauptverfasser:	Liu, Mingxuan, Zhang, Zihan, Zhang, Yiming, Zhang, Chao, Li, Zhou, Li, Qi, Duan, Haixin, Sun, Donghong
Format:	Artikel
Sprache:	eng
Schlagworte:	Electronic mail generation of adversarial example Image processing Natural language processing Perturbation methods Readability Robustness Semantics Shape Success Task analysis Texts Vocabulary
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Natural language processing (NLP) models are known vulnerable to adversarial examples, similar to image processing models. Studying adversarial texts is an essential step to improve the robustness of NLP models. However, existing studies mainly focus on generating adversarial texts for English, with no prior knowledge that whether those attacks could be applied to Chinese. After analyzing the differences between Chinese and English, we propose a novel adversarial Chinese text generation solution Argot, by utilizing the method for adversarial English examples and several novel methods developed on Chinese characteristics. Argot could effectively and efficiently generate adversarial Chinese texts with good readability in both white-box and black-box settings. Argot could also automatically generate targeted Chinese adversarial texts, achieving a high success rate and ensuring the readability of the generated texts. Furthermore, we apply Argot to the spam detection task in both local detection models and a public toxic content detection system from a well-known security company. Argot achieves a relatively high bypass success rate with fluent readability, which proves that the real-world toxic content detection system is vulnerable to adversarial example attacks. We also evaluate some available defense strategies, and the results indicate that Argot can still achieve high attack success rates.
ISSN:	1545-5971 1941-0018
DOI:	10.1109/TDSC.2022.3164289