Multimodal Attention for Neural Machine Translation
The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image ca...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The attention mechanism is an important part of the neural machine
translation (NMT) where it was reported to produce richer source representation
compared to fixed-length encoding sequence-to-sequence models. Recently, the
effectiveness of attention has also been explored in the context of image
captioning. In this work, we assess the feasibility of a multimodal attention
mechanism that simultaneously focus over an image and its natural language
description for generating a description in another language. We train several
variants of our proposed attention mechanism on the Multi30k multilingual image
captioning dataset. We show that a dedicated attention for each modality
achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT
baseline. |
---|---|
DOI: | 10.48550/arxiv.1609.03976 |