Transfer Learning for Text Diffusion Models

In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-01
Hauptverfasser: Han, Kehang, Kenealy, Kathleen, Barua, Aditya, Fiedel, Noah, Constant, Noah
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Han, Kehang
Kenealy, Kathleen
Barua, Aditya
Fiedel, Noah
Constant, Noah
description In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2920355424</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2920355424</sourcerecordid><originalsourceid>FETCH-proquest_journals_29203554243</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQDilKzCtOSy1S8ElNLMrLzEtXSMsvUghJrShRcMlMSystzszPU_DNT0nNKeZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjSyMDY1NTEyMTY-JUAQA41TEE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2920355424</pqid></control><display><type>article</type><title>Transfer Learning for Text Diffusion Models</title><source>Free E- Journals</source><creator>Han, Kehang ; Kenealy, Kathleen ; Barua, Aditya ; Fiedel, Noah ; Constant, Noah</creator><creatorcontrib>Han, Kehang ; Kenealy, Kathleen ; Barua, Aditya ; Fiedel, Noah ; Constant, Noah</creatorcontrib><description>In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Autoregressive processes ; Decoding ; Diffusion rate ; Large language models ; Machine translation ; Quality assurance ; Training</subject><ispartof>arXiv.org, 2024-01</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Han, Kehang</creatorcontrib><creatorcontrib>Kenealy, Kathleen</creatorcontrib><creatorcontrib>Barua, Aditya</creatorcontrib><creatorcontrib>Fiedel, Noah</creatorcontrib><creatorcontrib>Constant, Noah</creatorcontrib><title>Transfer Learning for Text Diffusion Models</title><title>arXiv.org</title><description>In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.</description><subject>Autoregressive processes</subject><subject>Decoding</subject><subject>Diffusion rate</subject><subject>Large language models</subject><subject>Machine translation</subject><subject>Quality assurance</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQDilKzCtOSy1S8ElNLMrLzEtXSMsvUghJrShRcMlMSystzszPU_DNT0nNKeZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjSyMDY1NTEyMTY-JUAQA41TEE</recordid><startdate>20240130</startdate><enddate>20240130</enddate><creator>Han, Kehang</creator><creator>Kenealy, Kathleen</creator><creator>Barua, Aditya</creator><creator>Fiedel, Noah</creator><creator>Constant, Noah</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240130</creationdate><title>Transfer Learning for Text Diffusion Models</title><author>Han, Kehang ; Kenealy, Kathleen ; Barua, Aditya ; Fiedel, Noah ; Constant, Noah</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29203554243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Autoregressive processes</topic><topic>Decoding</topic><topic>Diffusion rate</topic><topic>Large language models</topic><topic>Machine translation</topic><topic>Quality assurance</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Han, Kehang</creatorcontrib><creatorcontrib>Kenealy, Kathleen</creatorcontrib><creatorcontrib>Barua, Aditya</creatorcontrib><creatorcontrib>Fiedel, Noah</creatorcontrib><creatorcontrib>Constant, Noah</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Kehang</au><au>Kenealy, Kathleen</au><au>Barua, Aditya</au><au>Fiedel, Noah</au><au>Constant, Noah</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Transfer Learning for Text Diffusion Models</atitle><jtitle>arXiv.org</jtitle><date>2024-01-30</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-01
issn 2331-8422
language eng
recordid cdi_proquest_journals_2920355424
source Free E- Journals
subjects Autoregressive processes
Decoding
Diffusion rate
Large language models
Machine translation
Quality assurance
Training
title Transfer Learning for Text Diffusion Models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T07%3A51%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Transfer%20Learning%20for%20Text%20Diffusion%20Models&rft.jtitle=arXiv.org&rft.au=Han,%20Kehang&rft.date=2024-01-30&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2920355424%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2920355424&rft_id=info:pmid/&rfr_iscdi=true