Training Deeper Neural Machine Translation Models with Transparent Attention
While current state-of-the-art NMT models, such as RNN seq2seq and Transformers, possess a large number of parameters, they are still shallow in comparison to convolutional models used for both text and vision applications. In this work we attempt to train significantly (2-3x) deeper Transformer and...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | While current state-of-the-art NMT models, such as RNN seq2seq and
Transformers, possess a large number of parameters, they are still shallow in
comparison to convolutional models used for both text and vision applications.
In this work we attempt to train significantly (2-3x) deeper Transformer and
Bi-RNN encoders for machine translation. We propose a simple modification to
the attention mechanism that eases the optimization of deeper models, and
results in consistent gains of 0.7-1.1 BLEU on the benchmark WMT'14
English-German and WMT'15 Czech-English tasks for both architectures. |
---|---|
DOI: | 10.48550/arxiv.1808.07561 |