Research on Uyghur-Chinese Neural Machine Translation Based on the Transformer at Multistrategy Segmentation Granularity
In recent years, machine translation based on neural networks has become the mainstream method in the field of machine translation, but there are still challenges of insufficient parallel corpus and sparse data in the field of low resource translation. Existing machine translation models are usually...
Gespeichert in:
Veröffentlicht in: | Mobile information systems 2021, Vol.2021, p.1-7 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In recent years, machine translation based on neural networks has become the mainstream method in the field of machine translation, but there are still challenges of insufficient parallel corpus and sparse data in the field of low resource translation. Existing machine translation models are usually trained on word-granularity segmentation datasets. However, different segmentation granularities contain different grammatical and semantic features and information. Only considering word granularity will restrict the efficient training of neural machine translation systems. Aiming at the problem of data sparseness caused by the lack of Uyghur-Chinese parallel corpus and complex Uyghur morphology, this paper proposes a multistrategy segmentation granular training method for syllables, marked syllable, words, and syllable word fusion and targets traditional recurrent neural networks and convolutional neural networks; the disadvantage of the network is to build a Transformer Uyghur-Chinese Neural Machine Translation model based entirely on the multihead self-attention mechanism. In CCMT2019, dimension results on Uyghur-Chinese bilingual datasets show that the effect of multiple translation granularity training method is significantly better than the rest of granularity segmentation translation systems, while the Transformer model can obtain higher BLEU value than Uyghur-Chinese translation model based on Self-Attention-RNN. |
---|---|
ISSN: | 1574-017X 1875-905X |
DOI: | 10.1155/2021/5744248 |