Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding

In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations fr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-08
Hauptverfasser:	Liu, Fenglin, Ren, Xuancheng, Zhao, Guangxiang, You, Chenyu, Ma, Xuewei, Wu, Xian, Xu, Sun
Format:	Artikel
Sprache:	eng
Schlagworte:	Coders Decoding Learning Machine translation Representations
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Liu, Fenglin Ren, Xuancheng Zhao, Guangxiang You, Chenyu Ma, Xuewei Wu, Xian Xu, Sun
description	In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments and analyses show that we successfully address the hierarchy bypassing problem, require almost negligible parameter increase, and substantially improve the performance of sequence-to-sequence learning with deep representations on five diverse tasks, i.e., machine translation, abstractive summarization, image captioning, video captioning, medical report generation, and paraphrase generation. In particular, our approach achieves new state-of-the-art results on ten benchmark datasets, including a low-resource machine translation dataset and two low-resource medical report generation datasets.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2404497777</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2404497777</sourcerecordid><originalsourceid>FETCH-proquest_journals_24044977773</originalsourceid><addsrcrecordid>eNqNjEsKwjAYhIMgWLR3CLgOxDS1uvYN6kJE3ZVgf9vUmmgeirc3ggdwNsPwzUwLRSxJBmTEGeug2NqaUsqGGUvTJEKnHbhKqqtUJRaqwKvb3ejnN22F80Y0eC1U6UUJeAEKjHBSK_ySrgrgDYYcpQW88Y2T5CDhhadw1kXY91D7IhoL8c-7qD-f7SdLEv4fHqzLa-2NCihnnHI-zoKS_1oflR9B2Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2404497777</pqid></control><display><type>article</type><title>Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding</title><source>Free E- Journals</source><creator>Liu, Fenglin ; Ren, Xuancheng ; Zhao, Guangxiang ; You, Chenyu ; Ma, Xuewei ; Wu, Xian ; Xu, Sun</creator><creatorcontrib>Liu, Fenglin ; Ren, Xuancheng ; Zhao, Guangxiang ; You, Chenyu ; Ma, Xuewei ; Wu, Xian ; Xu, Sun</creatorcontrib><description>In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments and analyses show that we successfully address the hierarchy bypassing problem, require almost negligible parameter increase, and substantially improve the performance of sequence-to-sequence learning with deep representations on five diverse tasks, i.e., machine translation, abstractive summarization, image captioning, video captioning, medical report generation, and paraphrase generation. In particular, our approach achieves new state-of-the-art results on ten benchmark datasets, including a low-resource machine translation dataset and two low-resource medical report generation datasets.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Coders ; Decoding ; Learning ; Machine translation ; Representations</subject><ispartof>arXiv.org, 2022-08</ispartof><rights>2022. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Liu, Fenglin</creatorcontrib><creatorcontrib>Ren, Xuancheng</creatorcontrib><creatorcontrib>Zhao, Guangxiang</creatorcontrib><creatorcontrib>You, Chenyu</creatorcontrib><creatorcontrib>Ma, Xuewei</creatorcontrib><creatorcontrib>Wu, Xian</creatorcontrib><creatorcontrib>Xu, Sun</creatorcontrib><title>Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding</title><title>arXiv.org</title><description>In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments and analyses show that we successfully address the hierarchy bypassing problem, require almost negligible parameter increase, and substantially improve the performance of sequence-to-sequence learning with deep representations on five diverse tasks, i.e., machine translation, abstractive summarization, image captioning, video captioning, medical report generation, and paraphrase generation. In particular, our approach achieves new state-of-the-art results on ten benchmark datasets, including a low-resource machine translation dataset and two low-resource medical report generation datasets.</description><subject>Coders</subject><subject>Decoding</subject><subject>Learning</subject><subject>Machine translation</subject><subject>Representations</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNjEsKwjAYhIMgWLR3CLgOxDS1uvYN6kJE3ZVgf9vUmmgeirc3ggdwNsPwzUwLRSxJBmTEGeug2NqaUsqGGUvTJEKnHbhKqqtUJRaqwKvb3ejnN22F80Y0eC1U6UUJeAEKjHBSK_ySrgrgDYYcpQW88Y2T5CDhhadw1kXY91D7IhoL8c-7qD-f7SdLEv4fHqzLa-2NCihnnHI-zoKS_1oflR9B2Q</recordid><startdate>20220829</startdate><enddate>20220829</enddate><creator>Liu, Fenglin</creator><creator>Ren, Xuancheng</creator><creator>Zhao, Guangxiang</creator><creator>You, Chenyu</creator><creator>Ma, Xuewei</creator><creator>Wu, Xian</creator><creator>Xu, Sun</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20220829</creationdate><title>Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding</title><author>Liu, Fenglin ; Ren, Xuancheng ; Zhao, Guangxiang ; You, Chenyu ; Ma, Xuewei ; Wu, Xian ; Xu, Sun</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_24044977773</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Coders</topic><topic>Decoding</topic><topic>Learning</topic><topic>Machine translation</topic><topic>Representations</topic><toplevel>online_resources</toplevel><creatorcontrib>Liu, Fenglin</creatorcontrib><creatorcontrib>Ren, Xuancheng</creatorcontrib><creatorcontrib>Zhao, Guangxiang</creatorcontrib><creatorcontrib>You, Chenyu</creatorcontrib><creatorcontrib>Ma, Xuewei</creatorcontrib><creatorcontrib>Wu, Xian</creatorcontrib><creatorcontrib>Xu, Sun</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Liu, Fenglin</au><au>Ren, Xuancheng</au><au>Zhao, Guangxiang</au><au>You, Chenyu</au><au>Ma, Xuewei</au><au>Wu, Xian</au><au>Xu, Sun</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding</atitle><jtitle>arXiv.org</jtitle><date>2022-08-29</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>In sequence-to-sequence learning, e.g., natural language generation, the decoder relies on the attention mechanism to efficiently extract information from the encoder. While it is common practice to draw information from only the last encoder layer, recent work has proposed to use representations from different encoder layers for diversified levels of information. Nonetheless, the decoder still obtains only a single view of the source sequences, which might lead to insufficient training of the encoder layer stack due to the hierarchy bypassing problem. In this work, we propose layer-wise multi-view decoding, where for each decoder layer, together with the representations from the last encoder layer, which serve as a global view, those from other encoder layers are supplemented for a stereoscopic view of the source sequences. Systematic experiments and analyses show that we successfully address the hierarchy bypassing problem, require almost negligible parameter increase, and substantially improve the performance of sequence-to-sequence learning with deep representations on five diverse tasks, i.e., machine translation, abstractive summarization, image captioning, video captioning, medical report generation, and paraphrase generation. In particular, our approach achieves new state-of-the-art results on ten benchmark datasets, including a low-resource machine translation dataset and two low-resource medical report generation datasets.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2022-08
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2404497777
source	Free E- Journals
subjects	Coders Decoding Learning Machine translation Representations
title	Rethinking and Improving Natural Language Generation with Layer-Wise Multi-View Decoding
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T19%3A56%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Rethinking%20and%20Improving%20Natural%20Language%20Generation%20with%20Layer-Wise%20Multi-View%20Decoding&rft.jtitle=arXiv.org&rft.au=Liu,%20Fenglin&rft.date=2022-08-29&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2404497777%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2404497777&rft_id=info:pmid/&rfr_iscdi=true