Incorporating Bilingual Dictionaries for Low Resource Semi-Supervised Neural Machine Translation
We explore ways of incorporating bilingual dictionaries to enable semi-supervised neural machine translation. Conventional back-translation methods have shown success in leveraging target side monolingual data. However, since the quality of back-translation models is tied to the size of the availabl...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We explore ways of incorporating bilingual dictionaries to enable
semi-supervised neural machine translation. Conventional back-translation
methods have shown success in leveraging target side monolingual data. However,
since the quality of back-translation models is tied to the size of the
available parallel corpora, this could adversely impact the synthetically
generated sentences in a low resource setting. We propose a simple data
augmentation technique to address both this shortcoming. We incorporate widely
available bilingual dictionaries that yield word-by-word translations to
generate synthetic sentences. This automatically expands the vocabulary of the
model while maintaining high quality content. Our method shows an appreciable
improvement in performance over strong baselines. |
---|---|
DOI: | 10.48550/arxiv.2004.02071 |