Tibetan-Chinese neural machine translation method based on cross-language pre-training model
The invention provides a Tibetan-Chinese neural machine translation method based on a cross-language pre-training model, and relates to the technical field of language translation. Comprising the following steps: preprocessing preset Tibetan-Chinese parallel data to obtain a to-be-processed corpus;...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention provides a Tibetan-Chinese neural machine translation method based on a cross-language pre-training model, and relates to the technical field of language translation. Comprising the following steps: preprocessing preset Tibetan-Chinese parallel data to obtain a to-be-processed corpus; performing synonym replacement and back translation on the corpus by adopting a data enhancement mode; performing word segmentation on the Tibetan and Chinese parallel corpora in the corpora by using a subword-nmt algorithm, segmenting all words into sub-word units, reconstructing a new word table, and optimizing the new word table by using a VOLT model; using a multi-language pre-training translation model containing a plurality of language pairs in an mRASP model, and training the Tibetan-Chinese parallel corpora based on a transformer-big neural network machine translation architecture to obtain a translation model; and evaluating the translation model by adopting different length penalty factors during decoding |
---|