Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One

Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic characteristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), sever...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wang, Yezhen, Che, Tong, Li, Bo, Song, Kaitao, Pei, Hengzhi, Bengio, Yoshua, Li, Dongsheng
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computation and Language Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wang, Yezhen Che, Tong Li, Bo Song, Kaitao Pei, Hengzhi Bengio, Yoshua Li, Dongsheng
description	Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic characteristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), severely limiting their ability to model distributions properly. In this paper, we propose a unique method termed E-ARM for training autoregressive generative models that takes advantage of a well-designed energy-based learning objective. By leveraging the extra degree of freedom of the softmax operation, we are allowed to make the autoregressive model itself be an energy-based model for measuring the likelihood of input without introducing any extra parameters. Furthermore, we show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem and increase temporal coherence for autoregressive generative models. Extensive empirical results, covering benchmarks like language modeling, neural machine translation, and image generation, demonstrate the effectiveness of the proposed approach.
doi_str_mv	10.48550/arxiv.2206.12840
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2206_12840</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2206_12840</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-c873173665d4e4c42ad5e5e9331c7a1e6b6ef8adf732b1005124e14a4d0905443</originalsourceid><addsrcrecordid>eNotj81OwkAUhWfjwqAP4Mr7Aq3zP2UJDWITDJtuWDW3nVvSBFszMxB5ewu6Oic5-U7yMfYieK4LY_gbhp_hkkvJbS5kofkjOxymc4DVOU2BjoFiHC4EWxopYLrVz8nTCUocoSVYU0oUoOphpqAOhAmqBBhh3jczc7xma4zkYT_SE3vo8RTp-T8XrH7f1OVHtttvq3K1y9A6nnWFU8Ipa43XpDst0RsytFRKdA4F2dZSX6DvnZKt4NwIqUlo1J4vudFaLdjr3-3drfkOwxeGa3NzbO6O6hdEB0qW</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One</title><source>arXiv.org</source><creator>Wang, Yezhen ; Che, Tong ; Li, Bo ; Song, Kaitao ; Pei, Hengzhi ; Bengio, Yoshua ; Li, Dongsheng</creator><creatorcontrib>Wang, Yezhen ; Che, Tong ; Li, Bo ; Song, Kaitao ; Pei, Hengzhi ; Bengio, Yoshua ; Li, Dongsheng</creatorcontrib><description>Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic characteristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), severely limiting their ability to model distributions properly. In this paper, we propose a unique method termed E-ARM for training autoregressive generative models that takes advantage of a well-designed energy-based learning objective. By leveraging the extra degree of freedom of the softmax operation, we are allowed to make the autoregressive model itself be an energy-based model for measuring the likelihood of input without introducing any extra parameters. Furthermore, we show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem and increase temporal coherence for autoregressive generative models. Extensive empirical results, covering benchmarks like language modeling, neural machine translation, and image generation, demonstrate the effectiveness of the proposed approach.</description><identifier>DOI: 10.48550/arxiv.2206.12840</identifier><language>eng</language><subject>Computer Science - Computation and Language ; Computer Science - Learning</subject><creationdate>2022-06</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,782,887</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2206.12840$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2206.12840$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Yezhen</creatorcontrib><creatorcontrib>Che, Tong</creatorcontrib><creatorcontrib>Li, Bo</creatorcontrib><creatorcontrib>Song, Kaitao</creatorcontrib><creatorcontrib>Pei, Hengzhi</creatorcontrib><creatorcontrib>Bengio, Yoshua</creatorcontrib><creatorcontrib>Li, Dongsheng</creatorcontrib><title>Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One</title><description>Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic characteristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), severely limiting their ability to model distributions properly. In this paper, we propose a unique method termed E-ARM for training autoregressive generative models that takes advantage of a well-designed energy-based learning objective. By leveraging the extra degree of freedom of the softmax operation, we are allowed to make the autoregressive model itself be an energy-based model for measuring the likelihood of input without introducing any extra parameters. Furthermore, we show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem and increase temporal coherence for autoregressive generative models. Extensive empirical results, covering benchmarks like language modeling, neural machine translation, and image generation, demonstrate the effectiveness of the proposed approach.</description><subject>Computer Science - Computation and Language</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81OwkAUhWfjwqAP4Mr7Aq3zP2UJDWITDJtuWDW3nVvSBFszMxB5ewu6Oic5-U7yMfYieK4LY_gbhp_hkkvJbS5kofkjOxymc4DVOU2BjoFiHC4EWxopYLrVz8nTCUocoSVYU0oUoOphpqAOhAmqBBhh3jczc7xma4zkYT_SE3vo8RTp-T8XrH7f1OVHtttvq3K1y9A6nnWFU8Ipa43XpDst0RsytFRKdA4F2dZSX6DvnZKt4NwIqUlo1J4vudFaLdjr3-3drfkOwxeGa3NzbO6O6hdEB0qW</recordid><startdate>20220626</startdate><enddate>20220626</enddate><creator>Wang, Yezhen</creator><creator>Che, Tong</creator><creator>Li, Bo</creator><creator>Song, Kaitao</creator><creator>Pei, Hengzhi</creator><creator>Bengio, Yoshua</creator><creator>Li, Dongsheng</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220626</creationdate><title>Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One</title><author>Wang, Yezhen ; Che, Tong ; Li, Bo ; Song, Kaitao ; Pei, Hengzhi ; Bengio, Yoshua ; Li, Dongsheng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-c873173665d4e4c42ad5e5e9331c7a1e6b6ef8adf732b1005124e14a4d0905443</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computation and Language</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Yezhen</creatorcontrib><creatorcontrib>Che, Tong</creatorcontrib><creatorcontrib>Li, Bo</creatorcontrib><creatorcontrib>Song, Kaitao</creatorcontrib><creatorcontrib>Pei, Hengzhi</creatorcontrib><creatorcontrib>Bengio, Yoshua</creatorcontrib><creatorcontrib>Li, Dongsheng</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Yezhen</au><au>Che, Tong</au><au>Li, Bo</au><au>Song, Kaitao</au><au>Pei, Hengzhi</au><au>Bengio, Yoshua</au><au>Li, Dongsheng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One</atitle><date>2022-06-26</date><risdate>2022</risdate><abstract>Autoregressive generative models are commonly used, especially for those tasks involving sequential data. They have, however, been plagued by a slew of inherent flaws due to the intrinsic characteristics of chain-style conditional modeling (e.g., exposure bias or lack of long-range coherence), severely limiting their ability to model distributions properly. In this paper, we propose a unique method termed E-ARM for training autoregressive generative models that takes advantage of a well-designed energy-based learning objective. By leveraging the extra degree of freedom of the softmax operation, we are allowed to make the autoregressive model itself be an energy-based model for measuring the likelihood of input without introducing any extra parameters. Furthermore, we show that E-ARM can be trained efficiently and is capable of alleviating the exposure bias problem and increase temporal coherence for autoregressive generative models. Extensive empirical results, covering benchmarks like language modeling, neural machine translation, and image generation, demonstrate the effectiveness of the proposed approach.</abstract><doi>10.48550/arxiv.2206.12840</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2206.12840
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2206_12840
source	arXiv.org
subjects	Computer Science - Computation and Language Computer Science - Learning
title	Your Autoregressive Generative Model Can be Better If You Treat It as an Energy-Based One
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-03T05%3A59%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Your%20Autoregressive%20Generative%20Model%20Can%20be%20Better%20If%20You%20Treat%20It%20as%20an%20Energy-Based%20One&rft.au=Wang,%20Yezhen&rft.date=2022-06-26&rft_id=info:doi/10.48550/arxiv.2206.12840&rft_dat=%3Carxiv_GOX%3E2206_12840%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true