A Mongolian–Chinese Neural Machine Translation Method Based on Semantic-Context Data Augmentation
Neural machine translation (NMT) typically relies on a substantial number of bilingual parallel corpora for effective training. Mongolian, as a low-resource language, has relatively few parallel corpora, resulting in poor translation performance. Data augmentation (DA) is a practical and promising m...
Gespeichert in:
Veröffentlicht in: | Applied sciences 2024-04, Vol.14 (8), p.3442 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Neural machine translation (NMT) typically relies on a substantial number of bilingual parallel corpora for effective training. Mongolian, as a low-resource language, has relatively few parallel corpora, resulting in poor translation performance. Data augmentation (DA) is a practical and promising method to solve problems related to data sparsity and single semantic structure by expanding the size and structure of available data. In order to address the issues of data sparsity and semantic inconsistency in Mongolian–Chinese NMT processes, this paper proposes a new semantic-context DA method. This method adds an additional semantic encoder based on the original translation model, which utilizes both source and target sentences to generate different semantic vectors to enhance each training instance. The results show that this method significantly improves the quality of Mongolian–Chinese NMT tasks, with an increase of approximately 2.5 BLEU values compared to the basic Transformer model. Compared to the basic model, this method can achieve the same translation results with about half of the data, greatly improving translation efficiency. |
---|---|
ISSN: | 2076-3417 2076-3417 |
DOI: | 10.3390/app14083442 |