Tibetan-Chinese neural machine translation method based on cross-language pre-training model

The invention provides a Tibetan-Chinese neural machine translation method based on a cross-language pre-training model, and relates to the technical field of language translation. Comprising the following steps: preprocessing preset Tibetan-Chinese parallel data to obtain a to-be-processed corpus;...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: YONG CUO, NIMAZHAXI, YANG DAN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention provides a Tibetan-Chinese neural machine translation method based on a cross-language pre-training model, and relates to the technical field of language translation. Comprising the following steps: preprocessing preset Tibetan-Chinese parallel data to obtain a to-be-processed corpus; performing synonym replacement and back translation on the corpus by adopting a data enhancement mode; performing word segmentation on the Tibetan and Chinese parallel corpora in the corpora by using a subword-nmt algorithm, segmenting all words into sub-word units, reconstructing a new word table, and optimizing the new word table by using a VOLT model; using a multi-language pre-training translation model containing a plurality of language pairs in an mRASP model, and training the Tibetan-Chinese parallel corpora based on a transformer-big neural network machine translation architecture to obtain a translation model; and evaluating the translation model by adopting different length penalty factors during decoding