Transfer Learning for Text Diffusion Models

In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-01
Hauptverfasser:	Han, Kehang, Kenealy, Kathleen, Barua, Aditya, Fiedel, Noah, Constant, Noah
Format:	Artikel
Sprache:	eng
Schlagworte:	Autoregressive processes Decoding Diffusion rate Large language models Machine translation Quality assurance Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Han, Kehang Kenealy, Kathleen Barua, Aditya Fiedel, Noah Constant, Noah
description	In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2920355424</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2920355424</sourcerecordid><originalsourceid>FETCH-proquest_journals_29203554243</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQDilKzCtOSy1S8ElNLMrLzEtXSMsvUghJrShRcMlMSystzszPU_DNT0nNKeZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjSyMDY1NTEyMTY-JUAQA41TEE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2920355424</pqid></control><display><type>article</type><title>Transfer Learning for Text Diffusion Models</title><source>Free E- Journals</source><creator>Han, Kehang ; Kenealy, Kathleen ; Barua, Aditya ; Fiedel, Noah ; Constant, Noah</creator><creatorcontrib>Han, Kehang ; Kenealy, Kathleen ; Barua, Aditya ; Fiedel, Noah ; Constant, Noah</creatorcontrib><description>In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Autoregressive processes ; Decoding ; Diffusion rate ; Large language models ; Machine translation ; Quality assurance ; Training</subject><ispartof>arXiv.org, 2024-01</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Han, Kehang</creatorcontrib><creatorcontrib>Kenealy, Kathleen</creatorcontrib><creatorcontrib>Barua, Aditya</creatorcontrib><creatorcontrib>Fiedel, Noah</creatorcontrib><creatorcontrib>Constant, Noah</creatorcontrib><title>Transfer Learning for Text Diffusion Models</title><title>arXiv.org</title><description>In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.</description><subject>Autoregressive processes</subject><subject>Decoding</subject><subject>Diffusion rate</subject><subject>Large language models</subject><subject>Machine translation</subject><subject>Quality assurance</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQDilKzCtOSy1S8ElNLMrLzEtXSMsvUghJrShRcMlMSystzszPU_DNT0nNKeZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjSyMDY1NTEyMTY-JUAQA41TEE</recordid><startdate>20240130</startdate><enddate>20240130</enddate><creator>Han, Kehang</creator><creator>Kenealy, Kathleen</creator><creator>Barua, Aditya</creator><creator>Fiedel, Noah</creator><creator>Constant, Noah</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240130</creationdate><title>Transfer Learning for Text Diffusion Models</title><author>Han, Kehang ; Kenealy, Kathleen ; Barua, Aditya ; Fiedel, Noah ; Constant, Noah</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29203554243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Autoregressive processes</topic><topic>Decoding</topic><topic>Diffusion rate</topic><topic>Large language models</topic><topic>Machine translation</topic><topic>Quality assurance</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Han, Kehang</creatorcontrib><creatorcontrib>Kenealy, Kathleen</creatorcontrib><creatorcontrib>Barua, Aditya</creatorcontrib><creatorcontrib>Fiedel, Noah</creatorcontrib><creatorcontrib>Constant, Noah</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Kehang</au><au>Kenealy, Kathleen</au><au>Barua, Aditya</au><au>Fiedel, Noah</au><au>Constant, Noah</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Transfer Learning for Text Diffusion Models</atitle><jtitle>arXiv.org</jtitle><date>2024-01-30</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-01
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2920355424
source	Free E- Journals
subjects	Autoregressive processes Decoding Diffusion rate Large language models Machine translation Quality assurance Training
title	Transfer Learning for Text Diffusion Models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T07%3A51%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Transfer%20Learning%20for%20Text%20Diffusion%20Models&rft.jtitle=arXiv.org&rft.au=Han,%20Kehang&rft.date=2024-01-30&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2920355424%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2920355424&rft_id=info:pmid/&rfr_iscdi=true