The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
The past year has witnessed rapid advances in sequence-to-sequence (seq2seq) modeling for Machine Translation (MT). The classic RNN-based approaches to MT were first out-performed by the convolutional seq2seq model, which was then out-performed by the more recent Transformer model. Each of these new...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The past year has witnessed rapid advances in sequence-to-sequence (seq2seq)
modeling for Machine Translation (MT). The classic RNN-based approaches to MT
were first out-performed by the convolutional seq2seq model, which was then
out-performed by the more recent Transformer model. Each of these new
approaches consists of a fundamental architecture accompanied by a set of
modeling and training techniques that are in principle applicable to other
seq2seq architectures. In this paper, we tease apart the new architectures and
their accompanying techniques in two ways. First, we identify several key
modeling and training techniques, and apply them to the RNN architecture,
yielding a new RNMT+ model that outperforms all of the three fundamental
architectures on the benchmark WMT'14 English to French and English to German
tasks. Second, we analyze the properties of each fundamental seq2seq
architecture and devise new hybrid architectures intended to combine their
strengths. Our hybrid models obtain further improvements, outperforming the
RNMT+ model on both benchmark datasets. |
---|---|
DOI: | 10.48550/arxiv.1804.09849 |