A Survey on Data Augmentation for Text Classification

Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization capabilities, it can also address many other challenges and pro...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM computing surveys 2023-07, Vol.55 (7), p.1-39, Article 146
Hauptverfasser:	Bayer, Markus, Kaufhold, Marc-André, Reuter, Christian
Format:	Artikel
Sprache:	eng
Schlagworte:	Adversarial learning Classification Computer science Computing methodologies Data augmentation Machine learning Natural language processing Neural networks Regularization Supervised learning by classification Taxonomy Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Data augmentation, the artificial creation of training data for machine learning by transformations, is a widely studied research field across machine learning disciplines. While it is useful for increasing a model's generalization capabilities, it can also address many other challenges and problems, from overcoming a limited amount of training data to regularizing the objective, to limiting the amount of data used to protect privacy. Based on a precise description of the goals and applications of data augmentation and a taxonomy for existing works, this survey is concerned with data augmentation methods for textual classification and aims at providing a concise and comprehensive overview for researchers and practitioners. Derived from the taxonomy, we divide more than 100 methods into 12 different groupings and give state-of-the-art references expounding which methods are highly promising by relating them to each other. Finally, research perspectives that may constitute a building block for future work are provided.
ISSN:	0360-0300 1557-7341
DOI:	10.1145/3544558