Towards Building a Robust Toxicity Predictor

Recent NLP literature pays little attention to the robustness of toxicity language predictors, while these systems are most likely to be used in adversarial contexts. This paper presents a novel adversarial attack, \texttt{ToxicTrap}, introducing small word-level perturbations to fool SOTA text clas...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-04
Hauptverfasser:	Bespalov, Dmitriy, Bhabesh, Sourav, Xiang, Yi, Zhou, Liutong, Qi, Yanjun
Format:	Artikel
Sprache:	eng
Schlagworte:	Classifiers Robustness
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!