Enhancing low-resource neural machine translation with syntax-graph guided self-attention

Most neural machine translation (NMT) models only rely on parallel sentence pairs, while the performance drops sharply in low-resource cases, as the models fail to mine the linguistry of the corpus. Incorporating prior monolingual knowledge explicitly, such as syntax, has been shown to be effective...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Knowledge-based systems 2022-06, Vol.246, p.108615, Article 108615
Hauptverfasser: Gong, Longchao, Li, Yan, Guo, Junjun, Yu, Zhengtao, Gao, Shengxiang
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Most neural machine translation (NMT) models only rely on parallel sentence pairs, while the performance drops sharply in low-resource cases, as the models fail to mine the linguistry of the corpus. Incorporating prior monolingual knowledge explicitly, such as syntax, has been shown to be effective for NMT, particularly in low-resource scenarios. However, existing approaches have not exploited the full potential of the NMT architectures. In this paper, we present syntax-graph guided self-attention (SGSA): a neural network model that combines the source-side syntactic knowledge with multi-head self-attention. We introduce an additional syntax-aware localness modeling as a bias, which indicates that the syntactically relevant parts need to be paid more attention to. The bias is then incorporated into the original attention distribution to form a revised distribution. Moreover, to maintain the strength of capturing the meaningful semantic representations of source-sentence, we adopt a node random dropping strategy in multi-head self-attention subnetworks. Extensive experiments on several standard small-scale datasets demonstrate that SGSA can significantly improve the performance of Transformer-based NMT, and is also superior to the previous syntax-dependent state-of-the-art. •We propose a syntax-aware self-attention that integrates syntactic knowledge.•The syntactic dependency is exploited as a guidance, without any extra cost.•The syntactic dependency is converted as a graph to combine with the NMT model.•The syntax-aware approach also explicitly exploits sub-word units.•We introduce multiple attention representations for stronger robustness.•Experiments demonstrate that the approach achieves state-of-the-art results.
ISSN:0950-7051
1872-7409
DOI:10.1016/j.knosys.2022.108615