Transfer Learning for Text Diffusion Models
In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-01 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Han, Kehang Kenealy, Kathleen Barua, Aditya Fiedel, Noah Constant, Noah |
description | In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2920355424</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2920355424</sourcerecordid><originalsourceid>FETCH-proquest_journals_29203554243</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQDilKzCtOSy1S8ElNLMrLzEtXSMsvUghJrShRcMlMSystzszPU_DNT0nNKeZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjSyMDY1NTEyMTY-JUAQA41TEE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2920355424</pqid></control><display><type>article</type><title>Transfer Learning for Text Diffusion Models</title><source>Free E- Journals</source><creator>Han, Kehang ; Kenealy, Kathleen ; Barua, Aditya ; Fiedel, Noah ; Constant, Noah</creator><creatorcontrib>Han, Kehang ; Kenealy, Kathleen ; Barua, Aditya ; Fiedel, Noah ; Constant, Noah</creatorcontrib><description>In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Autoregressive processes ; Decoding ; Diffusion rate ; Large language models ; Machine translation ; Quality assurance ; Training</subject><ispartof>arXiv.org, 2024-01</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Han, Kehang</creatorcontrib><creatorcontrib>Kenealy, Kathleen</creatorcontrib><creatorcontrib>Barua, Aditya</creatorcontrib><creatorcontrib>Fiedel, Noah</creatorcontrib><creatorcontrib>Constant, Noah</creatorcontrib><title>Transfer Learning for Text Diffusion Models</title><title>arXiv.org</title><description>In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.</description><subject>Autoregressive processes</subject><subject>Decoding</subject><subject>Diffusion rate</subject><subject>Large language models</subject><subject>Machine translation</subject><subject>Quality assurance</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQDilKzCtOSy1S8ElNLMrLzEtXSMsvUghJrShRcMlMSystzszPU_DNT0nNKeZhYE1LzClO5YXS3AzKbq4hzh66BUX5haWpxSXxWfmlRXlAqXgjSyMDY1NTEyMTY-JUAQA41TEE</recordid><startdate>20240130</startdate><enddate>20240130</enddate><creator>Han, Kehang</creator><creator>Kenealy, Kathleen</creator><creator>Barua, Aditya</creator><creator>Fiedel, Noah</creator><creator>Constant, Noah</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240130</creationdate><title>Transfer Learning for Text Diffusion Models</title><author>Han, Kehang ; Kenealy, Kathleen ; Barua, Aditya ; Fiedel, Noah ; Constant, Noah</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29203554243</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Autoregressive processes</topic><topic>Decoding</topic><topic>Diffusion rate</topic><topic>Large language models</topic><topic>Machine translation</topic><topic>Quality assurance</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Han, Kehang</creatorcontrib><creatorcontrib>Kenealy, Kathleen</creatorcontrib><creatorcontrib>Barua, Aditya</creatorcontrib><creatorcontrib>Fiedel, Noah</creatorcontrib><creatorcontrib>Constant, Noah</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Han, Kehang</au><au>Kenealy, Kathleen</au><au>Barua, Aditya</au><au>Fiedel, Noah</au><au>Constant, Noah</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Transfer Learning for Text Diffusion Models</atitle><jtitle>arXiv.org</jtitle><date>2024-01-30</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>In this report, we explore the potential for text diffusion to replace autoregressive (AR) decoding for the training and deployment of large language models (LLMs). We are particularly interested to see whether pretrained AR models can be transformed into text diffusion models through a lightweight adaptation procedure we call ``AR2Diff''. We begin by establishing a strong baseline setup for training text diffusion models. Comparing across multiple architectures and pretraining objectives, we find that training a decoder-only model with a prefix LM objective is best or near-best across several tasks. Building on this finding, we test various transfer learning setups for text diffusion models. On machine translation, we find that text diffusion underperforms the standard AR approach. However, on code synthesis and extractive QA, we find diffusion models trained from scratch outperform AR models in many cases. We also observe quality gains from AR2Diff -- adapting AR models to use diffusion decoding. These results are promising given that text diffusion is relatively underexplored and can be significantly faster than AR decoding for long text generation.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-01 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2920355424 |
source | Free E- Journals |
subjects | Autoregressive processes Decoding Diffusion rate Large language models Machine translation Quality assurance Training |
title | Transfer Learning for Text Diffusion Models |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-09T07%3A51%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Transfer%20Learning%20for%20Text%20Diffusion%20Models&rft.jtitle=arXiv.org&rft.au=Han,%20Kehang&rft.date=2024-01-30&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2920355424%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2920355424&rft_id=info:pmid/&rfr_iscdi=true |