Multimodal Attention for Neural Machine Translation
The attention mechanism is an important part of the neural machine translation (NMT) where it was reported to produce richer source representation compared to fixed-length encoding sequence-to-sequence models. Recently, the effectiveness of attention has also been explored in the context of image ca...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Caglayan, Ozan Barrault, Loïc Bougares, Fethi |
description | The attention mechanism is an important part of the neural machine
translation (NMT) where it was reported to produce richer source representation
compared to fixed-length encoding sequence-to-sequence models. Recently, the
effectiveness of attention has also been explored in the context of image
captioning. In this work, we assess the feasibility of a multimodal attention
mechanism that simultaneously focus over an image and its natural language
description for generating a description in another language. We train several
variants of our proposed attention mechanism on the Multi30k multilingual image
captioning dataset. We show that a dedicated attention for each modality
achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT
baseline. |
doi_str_mv | 10.48550/arxiv.1609.03976 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1609_03976</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1609_03976</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-d18c68e9cc69acd8797317eb89bcce93db54d3ff7ceeb0bc22b66b70f569d6e53</originalsourceid><addsrcrecordid>eNotzsFuwjAQBFBfOCDgAzg1P5DUwXhtHxFqaSWgl9yj9XotLIUEmVC1f99CexppRho9IZa1rNZWa_mM-St9VjVIV0nlDEyFOty6MZ2HgF2xGUfuxzT0RRxyceRb_i0PSKfUc9Fk7K8d3ue5mETsrrz4z5loXl-a7Vu5_9i9bzf7EsFAGWpLYNkRgUMK1jijasPeOk_ETgWv10HFaIjZS0-rlQfwRkYNLgBrNRNPf7cPdXvJ6Yz5u73r24de_QCO9UB2</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Multimodal Attention for Neural Machine Translation</title><source>arXiv.org</source><creator>Caglayan, Ozan ; Barrault, Loïc ; Bougares, Fethi</creator><creatorcontrib>Caglayan, Ozan ; Barrault, Loïc ; Bougares, Fethi</creatorcontrib><description>The attention mechanism is an important part of the neural machine
translation (NMT) where it was reported to produce richer source representation
compared to fixed-length encoding sequence-to-sequence models. Recently, the
effectiveness of attention has also been explored in the context of image
captioning. In this work, we assess the feasibility of a multimodal attention
mechanism that simultaneously focus over an image and its natural language
description for generating a description in another language. We train several
variants of our proposed attention mechanism on the Multi30k multilingual image
captioning dataset. We show that a dedicated attention for each modality
achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT
baseline.</description><identifier>DOI: 10.48550/arxiv.1609.03976</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Neural and Evolutionary Computing</subject><creationdate>2016-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1609.03976$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1609.03976$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Caglayan, Ozan</creatorcontrib><creatorcontrib>Barrault, Loïc</creatorcontrib><creatorcontrib>Bougares, Fethi</creatorcontrib><title>Multimodal Attention for Neural Machine Translation</title><description>The attention mechanism is an important part of the neural machine
translation (NMT) where it was reported to produce richer source representation
compared to fixed-length encoding sequence-to-sequence models. Recently, the
effectiveness of attention has also been explored in the context of image
captioning. In this work, we assess the feasibility of a multimodal attention
mechanism that simultaneously focus over an image and its natural language
description for generating a description in another language. We train several
variants of our proposed attention mechanism on the Multi30k multilingual image
captioning dataset. We show that a dedicated attention for each modality
achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT
baseline.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Neural and Evolutionary Computing</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2016</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzsFuwjAQBFBfOCDgAzg1P5DUwXhtHxFqaSWgl9yj9XotLIUEmVC1f99CexppRho9IZa1rNZWa_mM-St9VjVIV0nlDEyFOty6MZ2HgF2xGUfuxzT0RRxyceRb_i0PSKfUc9Fk7K8d3ue5mETsrrz4z5loXl-a7Vu5_9i9bzf7EsFAGWpLYNkRgUMK1jijasPeOk_ETgWv10HFaIjZS0-rlQfwRkYNLgBrNRNPf7cPdXvJ6Yz5u73r24de_QCO9UB2</recordid><startdate>20160913</startdate><enddate>20160913</enddate><creator>Caglayan, Ozan</creator><creator>Barrault, Loïc</creator><creator>Bougares, Fethi</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20160913</creationdate><title>Multimodal Attention for Neural Machine Translation</title><author>Caglayan, Ozan ; Barrault, Loïc ; Bougares, Fethi</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-d18c68e9cc69acd8797317eb89bcce93db54d3ff7ceeb0bc22b66b70f569d6e53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2016</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Neural and Evolutionary Computing</topic><toplevel>online_resources</toplevel><creatorcontrib>Caglayan, Ozan</creatorcontrib><creatorcontrib>Barrault, Loïc</creatorcontrib><creatorcontrib>Bougares, Fethi</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Caglayan, Ozan</au><au>Barrault, Loïc</au><au>Bougares, Fethi</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Multimodal Attention for Neural Machine Translation</atitle><date>2016-09-13</date><risdate>2016</risdate><abstract>The attention mechanism is an important part of the neural machine
translation (NMT) where it was reported to produce richer source representation
compared to fixed-length encoding sequence-to-sequence models. Recently, the
effectiveness of attention has also been explored in the context of image
captioning. In this work, we assess the feasibility of a multimodal attention
mechanism that simultaneously focus over an image and its natural language
description for generating a description in another language. We train several
variants of our proposed attention mechanism on the Multi30k multilingual image
captioning dataset. We show that a dedicated attention for each modality
achieves up to 1.6 points in BLEU and METEOR compared to a textual NMT
baseline.</abstract><doi>10.48550/arxiv.1609.03976</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1609.03976 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1609_03976 |
source | arXiv.org |
subjects | Computer Science - Computation and Language Computer Science - Neural and Evolutionary Computing |
title | Multimodal Attention for Neural Machine Translation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-14T14%3A10%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Multimodal%20Attention%20for%20Neural%20Machine%20Translation&rft.au=Caglayan,%20Ozan&rft.date=2016-09-13&rft_id=info:doi/10.48550/arxiv.1609.03976&rft_dat=%3Carxiv_GOX%3E1609_03976%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |