Research on English–Chinese machine translation shift based on word vector similarity
In English–Chinese machine translation shift, the processing of out-of-vocabulary (OOV) words has a great impact on translation quality. Aiming at OOV, this paper proposed a method based on word vector similarity, calculated the word vector similarity based on the Skip-gram model, used the most simi...
Gespeichert in:
Veröffentlicht in: | Artificial life and robotics 2024-11, Vol.29 (4), p.585-589 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In English–Chinese machine translation shift, the processing of out-of-vocabulary (OOV) words has a great impact on translation quality. Aiming at OOV, this paper proposed a method based on word vector similarity, calculated the word vector similarity based on the Skip-gram model, used the most similar words to replace OOV in the source sentences, and used the replaced corpus to train the Transformer model. It was found that when the original corpus was used for training, the bilingual evaluation understudy-4 (BLEU-4) of the Transformer model on NIST2006 and NIST2008 was 37.29 and 30.73, respectively. However, when the word vector similarity was used for processing and low-frequency OOV words were retained, the BLEU-4 of the Transformer model on NIST2006 and NIST2008 was improved to 37.36 and 30.78 respectively, showing an increase. Moreover, the translation quality obtained by retaining low-frequency OOV words was better than that obtained by removing low-frequency OOV words. The experimental results prove that the English–Chinese machine translation shift method based on word vector similarity is reliable and can be applied in practice. |
---|---|
ISSN: | 1433-5298 1614-7456 |
DOI: | 10.1007/s10015-024-00964-5 |