Adapting High-resource NMT Models to Translate Low-resource Related Languages without Parallel Data
The scarcity of parallel data is a major obstacle for training high-quality machine translation systems for low-resource languages. Fortunately, some low-resource languages are linguistically related or similar to high-resource languages; these related languages may share many lexical or syntactic s...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The scarcity of parallel data is a major obstacle for training high-quality
machine translation systems for low-resource languages. Fortunately, some
low-resource languages are linguistically related or similar to high-resource
languages; these related languages may share many lexical or syntactic
structures. In this work, we exploit this linguistic overlap to facilitate
translating to and from a low-resource language with only monolingual data, in
addition to any parallel data in the related high-resource language. Our
method, NMT-Adapt, combines denoising autoencoding, back-translation and
adversarial objectives to utilize monolingual data for low-resource adaptation.
We experiment on 7 languages from three different language families and show
that our technique significantly improves translation into low-resource
language compared to other translation baselines. |
---|---|
DOI: | 10.48550/arxiv.2105.15071 |