Multimodal Attention for Neural Machine Translation

The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image ca...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Caglayan, Ozan, Barrault, Loïc, Bougares, Fethi
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Neural and Evolutionary Computing
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Caglayan, Ozan Barrault, Loïc Bougares, Fethi
description	The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.
doi_str_mv	10.48550/arxiv.1609.03976
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1609_03976</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1609_03976</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-d18c68e9cc69acd8797317eb89bcce93db54d3ff7ceeb0bc22b66b70f569d6e53</originalsourceid><addsrcrecordid>eNotzsFuwjAQBFBfOCDgAzg1P5DUwXhtHxFqaSWgl9yj9XotLIUEmVC1f99CexppRho9IZa1rNZWa_mM-St9VjVIV0nlDEyFOty6MZ2HgF2xGUfuxzT0RRxyceRb_i0PSKfUc9Fk7K8d3ue5mETsrrz4z5loXl-a7Vu5_9i9bzf7EsFAGWpLYNkRgUMK1jijasPeOk_ETgWv10HFaIjZS0-rlQfwRkYNLgBrNRNPf7cPdXvJ6Yz5u73r24de_QCO9UB2</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Multimodal Attention for Neural Machine Translation</title><source>arXiv.org</source><creator>Caglayan, Ozan ; Barrault, Loïc ; Bougares, Fethi</creator><creatorcontrib>Caglayan, Ozan ; Barrault, Loïc ; Bougares, Fethi</creatorcontrib><description>The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.</description><identifier>DOI: 10.48550/arxiv.1609.03976</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Neural and Evolutionary Computing</subject><creationdate>2016-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1609.03976$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1609.03976$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Caglayan, Ozan</creatorcontrib><creatorcontrib>Barrault, Loïc</creatorcontrib><creatorcontrib>Bougares, Fethi</creatorcontrib><title>Multimodal Attention for Neural Machine Translation</title><description>The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Neural and Evolutionary Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzsFuwjAQBFBfOCDgAzg1P5DUwXhtHxFqaSWgl9yj9XotLIUEmVC1f99CexppRho9IZa1rNZWa_mM-St9VjVIV0nlDEyFOty6MZ2HgF2xGUfuxzT0RRxyceRb_i0PSKfUc9Fk7K8d3ue5mETsrrz4z5loXl-a7Vu5_9i9bzf7EsFAGWpLYNkRgUMK1jijasPeOk_ETgWv10HFaIjZS0-rlQfwRkYNLgBrNRNPf7cPdXvJ6Yz5u73r24de_QCO9UB2</recordid><startdate>20160913</startdate><enddate>20160913</enddate><creator>Caglayan, Ozan</creator><creator>Barrault, Loïc</creator><creator>Bougares, Fethi</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20160913</creationdate><title>Multimodal Attention for Neural Machine Translation</title><author>Caglayan, Ozan ; Barrault, Loïc ; Bougares, Fethi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-d18c68e9cc69acd8797317eb89bcce93db54d3ff7ceeb0bc22b66b70f569d6e53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Neural and Evolutionary Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Caglayan, Ozan</creatorcontrib><creatorcontrib>Barrault, Loïc</creatorcontrib><creatorcontrib>Bougares, Fethi</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Caglayan, Ozan</au><au>Barrault, Loïc</au><au>Bougares, Fethi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multimodal Attention for Neural Machine Translation</atitle><date>2016-09-13</date><risdate>2016</risdate><abstract>The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image captioning. In this work, we assess the feasibility of a multimodal attention mechanism that simultaneously focus over an image and its natural language description for generating a description in another language. We train several variants of our proposed attention mechanism on the Multi30k multilingual image captioning dataset. We show that a dedicated attention for each modality achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT baseline.</abstract><doi>10.48550/arxiv.1609.03976</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1609.03976
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1609_03976
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Neural and Evolutionary Computing
title	Multimodal Attention for Neural Machine Translation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T14%3A10%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multimodal%20Attention%20for%20Neural%20Machine%20Translation&rft.au=Caglayan,%20Ozan&rft.date=2016-09-13&rft_id=info:doi/10.48550/arxiv.1609.03976&rft_dat=%3Carxiv_GOX%3E1609_03976%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true