Ancient Text Translation Model Optimized with GujiBERT and Entropy-SkipBERT

To cope with the challenges posed by the complex linguistic structure and lexical polysemy in ancient texts, this study proposes a two-stage translation model. First, we combine GujiBERT, GCN, and LSTM to categorize ancient texts into historical and non-historical categories. This categorization lay...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Electronics (Basel) 2024-11, Vol.13 (22), p.4492
Hauptverfasser: Yu, Fuxing, Han, Rui, Zhang, Yanchao, Han, Yang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:To cope with the challenges posed by the complex linguistic structure and lexical polysemy in ancient texts, this study proposes a two-stage translation model. First, we combine GujiBERT, GCN, and LSTM to categorize ancient texts into historical and non-historical categories. This categorization lays the foundation for the subsequent translation task. To improve the efficiency of word vector generation and reduce the limitations of the traditional Word2Vec model, we integrated the entropy weight method in the hopping lattice training process and spliced the word vectors with GujiBERT. This improved method improves the efficiency of word vector generation and enhances the model’s ability to accurately represent lexical polysemy and grammatical structure in ancient documents through dependency weighting. In training the translation model, we used a different dataset for each text category, significantly improving the translation accuracy. Experimental results show that our categorization model improves the accuracy by 5% compared to GujiBERT. In contrast, the Entropy-SkipBERT improves the BLEU scores by 0.7 and 0.4 on historical and non-historical datasets. Ultimately, the proposed two-stage model improves the BLEU scores by 2.7 over the baseline model.
ISSN:2079-9292
2079-9292
DOI:10.3390/electronics13224492