Enhancing Machine-Generated Text Detection: Adversarial Fine-Tuning of Pre-Trained Language Models

Advances in large language models (LLMs) have revolutionized the natural language processing field. However, the text generated by LLMs can result in various issues, such as fake news, misinformation, and social media spam. In addition, detecting machine-generated text is becoming increasingly diffi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.65333-65340
Hauptverfasser:	Hee Lee, Dong, Jang, Beakcheol
Format:	Artikel
Sprache:	eng
Schlagworte:	Adversarial machine learning adversarial training Data models Encoding False information Language Large language models Machine generated text detection Natural language processing Solid modeling Task analysis Text categorization text classification Text detection Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Advances in large language models (LLMs) have revolutionized the natural language processing field. However, the text generated by LLMs can result in various issues, such as fake news, misinformation, and social media spam. In addition, detecting machine-generated text is becoming increasingly difficult because it produces text that resembles human writing. We propose a new method for effectively detecting machine-generated text by applying adversarial training (AT) to pre-trained language models (PLMs), such as Bidirectional Encoder Representations from Transformers (BERT). We generated adversarial examples that appeared to have been modified by humans and applied them to the PLMs to improve the model's detection capabilities. The proposed method was validated on various datasets and experiments. It showed improved performance compared to traditional fine-tuning methods, with an average reduction in the probability of misclassification of machine-generated text by about 10%. We demonstrated the robustness of the model when generated with input tokens of different lengths and under different training data ratios. We suggested future research directions for applying AT to different languages and language model types. This study opens new possibilities for applying AT to the problem of machine-generated text detection and classification and contributes to building more effective detection models.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3396820