Controllable Data Augmentation for Few-Shot Text Mining with Chain-of-Thought Attribute Manipulation
Prompting large language models (LLMs) for data augmentation has recently become a common practice in few-shot NLP tasks. In this paper, we propose Chain-of-Thought Attribute Manipulation (CoTAM), a novel approach that generates new data from existing examples by only tweaking in the user-provided,...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Prompting large language models (LLMs) for data augmentation has recently
become a common practice in few-shot NLP tasks. In this paper, we propose
Chain-of-Thought Attribute Manipulation (CoTAM), a novel approach that
generates new data from existing examples by only tweaking in the
user-provided, task-specific attribute, e.g., sentiment polarity or topic in
movie reviews. Instead of conventional latent representation controlling, we
leverage the chain-of-thought prompting to directly edit the text in three
steps, (1) attribute decomposition, (2) manipulation proposal, and (3) sentence
reconstruction. Extensive results on various tasks, such as text (pair)
classification, aspect-based sentiment analysis, and conditional text
generation, verify the superiority of CoTAM over other LLM-based augmentation
methods with the same number of training examples for both fine-tuning and
in-context learning. Remarkably, the 2D visualization of the augmented dataset
using principal component analysis revealed a human-recognizable decision
boundary that is likely hinted by the attribute manipulation, demonstrating the
potential of our proposed approach. |
---|---|
DOI: | 10.48550/arxiv.2307.07099 |