Multimodal Attention for Neural Machine Translation

The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image ca...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Caglayan, Ozan, Barrault, Loïc, Bougares, Fethi
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Caglayan, Ozan
Barrault, Loïc
Bougares, Fethi
description The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.
doi_str_mv 10.48550/arxiv.1609.03976
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1609_03976</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1609_03976</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-d18c68e9cc69acd8797317eb89bcce93db54d3ff7ceeb0bc22b66b70f569d6e53</originalsourceid><addsrcrecordid>eNotzsFuwjAQBFBfOCDgAzg1P5DUwXhtHxFqaSWgl9yj9XotLIUEmVC1f99CexppRho9IZa1rNZWa_mM-St9VjVIV0nlDEyFOty6MZ2HgF2xGUfuxzT0RRxyceRb_i0PSKfUc9Fk7K8d3ue5mETsrrz4z5loXl-a7Vu5_9i9bzf7EsFAGWpLYNkRgUMK1jijasPeOk_ETgWv10HFaIjZS0-rlQfwRkYNLgBrNRNPf7cPdXvJ6Yz5u73r24de_QCO9UB2</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Multimodal Attention for Neural Machine Translation</title><source>arXiv.org</source><creator>Caglayan, Ozan ; Barrault, Loïc ; Bougares, Fethi</creator><creatorcontrib>Caglayan, Ozan ; Barrault, Loïc ; Bougares, Fethi</creatorcontrib><description>The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.</description><identifier>DOI: 10.48550/arxiv.1609.03976</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Neural and Evolutionary Computing</subject><creationdate>2016-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1609.03976$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1609.03976$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Caglayan, Ozan</creatorcontrib><creatorcontrib>Barrault, Loïc</creatorcontrib><creatorcontrib>Bougares, Fethi</creatorcontrib><title>Multimodal Attention for Neural Machine Translation</title><description>The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Neural and Evolutionary Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzsFuwjAQBFBfOCDgAzg1P5DUwXhtHxFqaSWgl9yj9XotLIUEmVC1f99CexppRho9IZa1rNZWa_mM-St9VjVIV0nlDEyFOty6MZ2HgF2xGUfuxzT0RRxyceRb_i0PSKfUc9Fk7K8d3ue5mETsrrz4z5loXl-a7Vu5_9i9bzf7EsFAGWpLYNkRgUMK1jijasPeOk_ETgWv10HFaIjZS0-rlQfwRkYNLgBrNRNPf7cPdXvJ6Yz5u73r24de_QCO9UB2</recordid><startdate>20160913</startdate><enddate>20160913</enddate><creator>Caglayan, Ozan</creator><creator>Barrault, Loïc</creator><creator>Bougares, Fethi</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20160913</creationdate><title>Multimodal Attention for Neural Machine Translation</title><author>Caglayan, Ozan ; Barrault, Loïc ; Bougares, Fethi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-d18c68e9cc69acd8797317eb89bcce93db54d3ff7ceeb0bc22b66b70f569d6e53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Neural and Evolutionary Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Caglayan, Ozan</creatorcontrib><creatorcontrib>Barrault, Loïc</creatorcontrib><creatorcontrib>Bougares, Fethi</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Caglayan, Ozan</au><au>Barrault, Loïc</au><au>Bougares, Fethi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multimodal Attention for Neural Machine Translation</atitle><date>2016-09-13</date><risdate>2016</risdate><abstract>The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.</abstract><doi>10.48550/arxiv.1609.03976</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1609.03976
ispartof
issn
language eng
recordid cdi_arxiv_primary_1609_03976
source arXiv.org
subjects Computer Science - Computation and Language
Computer Science - Neural and Evolutionary Computing
title Multimodal Attention for Neural Machine Translation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T14%3A10%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multimodal%20Attention%20for%20Neural%20Machine%20Translation&rft.au=Caglayan,%20Ozan&rft.date=2016-09-13&rft_id=info:doi/10.48550/arxiv.1609.03976&rft_dat=%3Carxiv_GOX%3E1609_03976%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true