SRL-ACO: A text augmentation framework based on semantic role labeling and ant colony optimization
The process of creating high-quality labeled data is crucial for training machine-learning models, but it can be a time-consuming and labor-intensive process. Moreover, manual annotation by human annotators can lead to varying degrees of competency, training, and experience, which can result in inco...
Gespeichert in:
Veröffentlicht in: | Journal of King Saud University. Computer and information sciences 2023-07, Vol.35 (7), p.101611, Article 101611 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The process of creating high-quality labeled data is crucial for training machine-learning models, but it can be a time-consuming and labor-intensive process. Moreover, manual annotation by human annotators can lead to varying degrees of competency, training, and experience, which can result in inconsistent labeling and arbitrary standards. To address these challenges, researchers have been exploring automated methods for enhancing training and testing datasets. This paper proposes SRL-ACO, a novel text augmentation framework that leverages Semantic Role Labeling (SRL) and Ant Colony Optimization (ACO) techniques to generate additional training data for natural language processing (NLP) models. The framework uses SRL to identify the semantic roles of words in a sentence and ACO to generate new sentences that preserve these roles. SRL-ACO can enhance the accuracy of NLP models by generating additional data without requiring manual data annotation. The paper presents experimental results demonstrating the effectiveness of SRL-ACO on seven text classification datasets for sentiment analysis, toxic text detection and sarcasm identification. The results show that SRL-ACO improves the performance of a classifier on different NLP tasks. These results demonstrate that SRL-ACO has the potential to enhance the quality and quantity of training data for various NLP tasks. |
---|---|
ISSN: | 1319-1578 2213-1248 |
DOI: | 10.1016/j.jksuci.2023.101611 |