Neural machine translation-oriented data selection and training method
The invention discloses a neural machine translation-oriented data selection and training method. The method comprises the following steps of: constructing a monolingual corpus; carrying out cleaning, filtering, word segmentation and sub-word segmentation preprocessing on the monolingual corpus to o...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a neural machine translation-oriented data selection and training method. The method comprises the following steps of: constructing a monolingual corpus; carrying out cleaning, filtering, word segmentation and sub-word segmentation preprocessing on the monolingual corpus to obtain training data; using the training data to finely adjust a pre-training model through a language model; inputting the monolingual data of two languages into codes, comparing the vector similarity of the two coded monolingual data, merging two sentences with the highest similarity into pseudo bilingual data, and constructing a pseudo parallel corpus; processing the pseudo-parallel corpora by using a word segmentation and sub-word segmentation method of the pre-training model, and initializing encoder parameters of a neural machine translation framework by using the pre-training model; pre-training a neural machine translation model by using the processed pseudo-parallel corpus; and finely tuning the neural mach |
---|