Syntax-driven Data Augmentation for Named Entity Recognition
In low resource settings, data augmentation strategies are commonly leveraged to improve performance. Numerous approaches have attempted document-level augmentation (e.g., text classification), but few studies have explored token-level augmentation. Performed naively, data augmentation can produce s...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In low resource settings, data augmentation strategies are commonly leveraged
to improve performance. Numerous approaches have attempted document-level
augmentation (e.g., text classification), but few studies have explored
token-level augmentation. Performed naively, data augmentation can produce
semantically incongruent and ungrammatical examples. In this work, we compare
simple masked language model replacement and an augmentation method using
constituency tree mutations to improve the performance of named entity
recognition in low-resource settings with the aim of preserving linguistic
cohesion of the augmented sentences. |
---|---|
DOI: | 10.48550/arxiv.2208.06957 |