End-to-end entity-aware neural machine translation

Accurate translation of entities (e.g., person names, organizations, geography) is important in neural machine translation (briefly, NMT), as they are usually more difficult to translate than other words, and an incorrect translation of them will greatly hurt user experiences. In previous works, ent...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Machine learning 2022-03, Vol.111 (3), p.1181-1203
Hauptverfasser:	Xie, Shufang, Xia, Yingce, Wu, Lijun, Huang, Yiqing, Fan, Yang, Qin, Tao
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Coders Computer Science Control Inference Machine Learning Machine translation Mechatronics Natural Language Processing (NLP) Robotics Simulation and Modeling Special Issue of the ACML 2021 Journal Track Translations User experience
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Accurate translation of entities (e.g., person names, organizations, geography) is important in neural machine translation (briefly, NMT), as they are usually more difficult to translate than other words, and an incorrect translation of them will greatly hurt user experiences. In previous works, entities are either treated in the same way as other words, which leads to inaccurate translation, or handled by multiple steps (including named entity recognition, translation, and replacing entities back), which significantly increase the inference latency. In this work, we propose an end-to-end algorithm that carefully handles the translation of entities. There are mainly two novel parts compared to conventional NMT model: (1) The encoder and the decoder are attached with entity classifiers, which are used to verify whether the input token is a named entity. In this way, the encoder and decoder are capable to treat named entities differently; (2) The translation loss of each target token is adaptively increased by the probability that the target token is a named entity, which results in more accurate translation of entities. During inference time, these two parts will be removed so that the translation model maintains the same inference speed as conventional NMT models. Empirical results on six translation tasks demonstrate the effectiveness of our methods of improving the translation quality. Specifically, we improve 1.7 BLEU scores on Japanese to English translation and 4.6 entity F 1 scores on English to Chinese translation, without additional inference cost.
ISSN:	0885-6125 1573-0565
DOI:	10.1007/s10994-021-06073-9