Disentangled Text Representation Learning With Information-Theoretic Perspective for Adversarial Robustness

Adversarial vulnerability remains a major obstacle to the construction of reliable NLP systems. When imperceptible perturbations are added to raw input text, the performance of a deep learning model may drop dramatically under attacks. Recent work has argued that the adversarial vulnerability of a m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE/ACM transactions on audio, speech, and language processing speech, and language processing, 2024, Vol.32, p.1237-1247
Hauptverfasser:	Zhao, Jiahao, Mao, Wenji, Zeng, Daniel Dajun
Format:	Artikel
Sprache:	eng
Schlagworte:	Adversarial robustness Data augmentation Deep learning disentangled text representation learning Information theory Mutual information Perturbation methods Representation learning Representations Robustness Task analysis Training variation of information
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Adversarial vulnerability remains a major obstacle to the construction of reliable NLP systems. When imperceptible perturbations are added to raw input text, the performance of a deep learning model may drop dramatically under attacks. Recent work has argued that the adversarial vulnerability of a model is caused by non-robust features in supervised training. Thus, in this paper, we tackle the adversarial robustness challenge by means of disentangled representation learning, which is able to explicitly disentangle robust and non-robust features in text. Specifically, inspired by the variation of information (VI) in information theory, we derive a disentangled learning objective composed of mutual information to represent both the semantic representativeness of latent embeddings and the differentiation of robust and non-robust features. On the basis of this, we design a disentangled learning network to estimate the mutual information for realization. Experiments on the typical text-based tasks show that our method significantly outperforms the representative methods under adversarial attacks, indicating that discarding non-robust features is critical for improving model robustness.
ISSN:	2329-9290 2329-9304
DOI:	10.1109/TASLP.2024.3358052