Tailored text augmentation for sentiment analysis

In synonym replacement-based data augmentation techniques for natural language processing tasks, words in a sentence are often sampled randomly with equal probability. In this paper, we propose a novel data augmentation technique named Tailored Text Argumentation (TTA) for sentiment analysis. It has...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2022-11, Vol.205, p.117605, Article 117605
Hauptverfasser: Feng, Zijian, Zhou, Hanzhang, Zhu, Zixiao, Mao, Kezhi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In synonym replacement-based data augmentation techniques for natural language processing tasks, words in a sentence are often sampled randomly with equal probability. In this paper, we propose a novel data augmentation technique named Tailored Text Argumentation (TTA) for sentiment analysis. It has two main operations. The first operation is the probabilistic word sampling for synonym replacement based on the discriminative power and relevance of the word to sentiment. The second operation is the identification of words irrelevant to sentiment but discriminative for the training data, and application of zero masking or contextual replacement to these words. The first operation expands the coverage of discriminative words, while the second operation alleviates the problem of misfitting. Both operations tend to improve the model’s generalization capability. Extensive experiments on simulated low-data regimes demonstrate that TTA yields notable improvements over six strong baselines. Finally, TTA is applied to public sentiment analysis on measures against Covid-19, which again proves the effectiveness of the new data augmentation algorithm. •A novel data augmentation algorithm for sentiment analysis.•Word sampling and replacing based on discriminative power and relevance to sentiment.•Application of the algorithm to sentiment analysis on polices against COVID-19.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.117605