Reranking Hypotheses in Translation Models Using Human Markup

Modern machine translation systems are trained on large volumes of parallel data obtained using heuristic methods of bypassing the Internet. The poor quality of the data leads to systematic translation errors, which can be quite noticeable to humans. To fix such errors, human-based models for rerank...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of computer & systems sciences international 2024-08, Vol.63 (4), p.679-686
Hauptverfasser: Vorontsov, K. V., Skachkov, N. A.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Modern machine translation systems are trained on large volumes of parallel data obtained using heuristic methods of bypassing the Internet. The poor quality of the data leads to systematic translation errors, which can be quite noticeable to humans. To fix such errors, human-based models for reranking hypotheses is introduced in this study. In this paper the use of human markup is shown not only to increase the overall quality of the translation but also to significantly reduce the number of systematic translation errors. In addition, the relative simplicity of human markup and its integration in the model training process opens up new opportunities in the field of domain adaptation of translation models for new domains like online retail.
ISSN:1064-2307
1555-6530
DOI:10.1134/S1064230724700497