FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models

Diffusion models often face slow convergence, and existing efficient training techniques, such as Parameter-Efficient Fine-Tuning (PEFT), are primarily designed for fine-tuning pre-trained models. However, these methods are limited in adapting models to variable sizes for real-world deployment, wher...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-09
Hauptverfasser: Xie, Yucheng, Fu, Feng, Shi, Ruixiao, Wang, Jing, Geng, Xin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Xie, Yucheng
Fu, Feng
Shi, Ruixiao
Wang, Jing
Geng, Xin
description Diffusion models often face slow convergence, and existing efficient training techniques, such as Parameter-Efficient Fine-Tuning (PEFT), are primarily designed for fine-tuning pre-trained models. However, these methods are limited in adapting models to variable sizes for real-world deployment, where no corresponding pre-trained models exist. To address this, we introduce FINE, a method based on the Learngene framework, to initializing downstream networks leveraging pre-trained models, while considering both model sizes and task-specific requirements. FINE decomposes pre-trained knowledge into the product of matrices (i.e., \(U\), \(\Sigma\), and \(V\)), where \(U\) and \(V\) are shared across network blocks as ``learngenes'', and \(\Sigma\) remains layer-specific. During initialization, FINE trains only \(\Sigma\) using a small subset of data, while keeping the learngene parameters fixed, marking it the first approach to integrate both size and task considerations in initialization. We provide a comprehensive benchmark for learngene-based methods in image generation tasks, and extensive experiments demonstrate that FINE consistently outperforms direct pre-training, particularly for smaller models, achieving state-of-the-art results across variable model sizes. FINE also offers significant computational and storage savings, reducing training steps by approximately \(3N\times\) and storage by \(5\times\), where \(N\) is the number of models. Additionally, FINE's adaptability to tasks yields an average performance improvement of 4.29 and 3.30 in FID and sFID across multiple downstream datasets, highlighting its versatility and efficiency.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3111726488</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3111726488</sourcerecordid><originalsourceid>FETCH-proquest_journals_31117264883</originalsourceid><addsrcrecordid>eNqNyt0KgjAYgOERBEl5D4OOBd38o9NSkigIolNZucknY19tk8Crr6AL6Og9eN4ZCRjnSVSmjC1I6NwQxzHLC5ZlPCDnujlVG1qLu0cLE5ieHgy-tOx6SRVa2hjwIDRMwgMaiopehQVx0zJyMMmO7kCp0X3tiJ3UbkXmSmgnw1-XZF1Xl-0-elh8jtL5dsDRmg-1PEmSguVpWfL_rjdoFj8F</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3111726488</pqid></control><display><type>article</type><title>FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models</title><source>Freely Accessible Journals</source><creator>Xie, Yucheng ; Fu, Feng ; Shi, Ruixiao ; Wang, Jing ; Geng, Xin</creator><creatorcontrib>Xie, Yucheng ; Fu, Feng ; Shi, Ruixiao ; Wang, Jing ; Geng, Xin</creatorcontrib><description>Diffusion models often face slow convergence, and existing efficient training techniques, such as Parameter-Efficient Fine-Tuning (PEFT), are primarily designed for fine-tuning pre-trained models. However, these methods are limited in adapting models to variable sizes for real-world deployment, where no corresponding pre-trained models exist. To address this, we introduce FINE, a method based on the Learngene framework, to initializing downstream networks leveraging pre-trained models, while considering both model sizes and task-specific requirements. FINE decomposes pre-trained knowledge into the product of matrices (i.e., \(U\), \(\Sigma\), and \(V\)), where \(U\) and \(V\) are shared across network blocks as ``learngenes'', and \(\Sigma\) remains layer-specific. During initialization, FINE trains only \(\Sigma\) using a small subset of data, while keeping the learngene parameters fixed, marking it the first approach to integrate both size and task considerations in initialization. We provide a comprehensive benchmark for learngene-based methods in image generation tasks, and extensive experiments demonstrate that FINE consistently outperforms direct pre-training, particularly for smaller models, achieving state-of-the-art results across variable model sizes. FINE also offers significant computational and storage savings, reducing training steps by approximately \(3N\times\) and storage by \(5\times\), where \(N\) is the number of models. Additionally, FINE's adaptability to tasks yields an average performance improvement of 4.29 and 3.30 in FID and sFID across multiple downstream datasets, highlighting its versatility and efficiency.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Diffusion rate ; Image processing ; Parameters</subject><ispartof>arXiv.org, 2024-09</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>781,785</link.rule.ids></links><search><creatorcontrib>Xie, Yucheng</creatorcontrib><creatorcontrib>Fu, Feng</creatorcontrib><creatorcontrib>Shi, Ruixiao</creatorcontrib><creatorcontrib>Wang, Jing</creatorcontrib><creatorcontrib>Geng, Xin</creatorcontrib><title>FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models</title><title>arXiv.org</title><description>Diffusion models often face slow convergence, and existing efficient training techniques, such as Parameter-Efficient Fine-Tuning (PEFT), are primarily designed for fine-tuning pre-trained models. However, these methods are limited in adapting models to variable sizes for real-world deployment, where no corresponding pre-trained models exist. To address this, we introduce FINE, a method based on the Learngene framework, to initializing downstream networks leveraging pre-trained models, while considering both model sizes and task-specific requirements. FINE decomposes pre-trained knowledge into the product of matrices (i.e., \(U\), \(\Sigma\), and \(V\)), where \(U\) and \(V\) are shared across network blocks as ``learngenes'', and \(\Sigma\) remains layer-specific. During initialization, FINE trains only \(\Sigma\) using a small subset of data, while keeping the learngene parameters fixed, marking it the first approach to integrate both size and task considerations in initialization. We provide a comprehensive benchmark for learngene-based methods in image generation tasks, and extensive experiments demonstrate that FINE consistently outperforms direct pre-training, particularly for smaller models, achieving state-of-the-art results across variable model sizes. FINE also offers significant computational and storage savings, reducing training steps by approximately \(3N\times\) and storage by \(5\times\), where \(N\) is the number of models. Additionally, FINE's adaptability to tasks yields an average performance improvement of 4.29 and 3.30 in FID and sFID across multiple downstream datasets, highlighting its versatility and efficiency.</description><subject>Diffusion rate</subject><subject>Image processing</subject><subject>Parameters</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNyt0KgjAYgOERBEl5D4OOBd38o9NSkigIolNZucknY19tk8Crr6AL6Og9eN4ZCRjnSVSmjC1I6NwQxzHLC5ZlPCDnujlVG1qLu0cLE5ieHgy-tOx6SRVa2hjwIDRMwgMaiopehQVx0zJyMMmO7kCp0X3tiJ3UbkXmSmgnw1-XZF1Xl-0-elh8jtL5dsDRmg-1PEmSguVpWfL_rjdoFj8F</recordid><startdate>20240928</startdate><enddate>20240928</enddate><creator>Xie, Yucheng</creator><creator>Fu, Feng</creator><creator>Shi, Ruixiao</creator><creator>Wang, Jing</creator><creator>Geng, Xin</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240928</creationdate><title>FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models</title><author>Xie, Yucheng ; Fu, Feng ; Shi, Ruixiao ; Wang, Jing ; Geng, Xin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31117264883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Diffusion rate</topic><topic>Image processing</topic><topic>Parameters</topic><toplevel>online_resources</toplevel><creatorcontrib>Xie, Yucheng</creatorcontrib><creatorcontrib>Fu, Feng</creatorcontrib><creatorcontrib>Shi, Ruixiao</creatorcontrib><creatorcontrib>Wang, Jing</creatorcontrib><creatorcontrib>Geng, Xin</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Xie, Yucheng</au><au>Fu, Feng</au><au>Shi, Ruixiao</au><au>Wang, Jing</au><au>Geng, Xin</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models</atitle><jtitle>arXiv.org</jtitle><date>2024-09-28</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Diffusion models often face slow convergence, and existing efficient training techniques, such as Parameter-Efficient Fine-Tuning (PEFT), are primarily designed for fine-tuning pre-trained models. However, these methods are limited in adapting models to variable sizes for real-world deployment, where no corresponding pre-trained models exist. To address this, we introduce FINE, a method based on the Learngene framework, to initializing downstream networks leveraging pre-trained models, while considering both model sizes and task-specific requirements. FINE decomposes pre-trained knowledge into the product of matrices (i.e., \(U\), \(\Sigma\), and \(V\)), where \(U\) and \(V\) are shared across network blocks as ``learngenes'', and \(\Sigma\) remains layer-specific. During initialization, FINE trains only \(\Sigma\) using a small subset of data, while keeping the learngene parameters fixed, marking it the first approach to integrate both size and task considerations in initialization. We provide a comprehensive benchmark for learngene-based methods in image generation tasks, and extensive experiments demonstrate that FINE consistently outperforms direct pre-training, particularly for smaller models, achieving state-of-the-art results across variable model sizes. FINE also offers significant computational and storage savings, reducing training steps by approximately \(3N\times\) and storage by \(5\times\), where \(N\) is the number of models. Additionally, FINE's adaptability to tasks yields an average performance improvement of 4.29 and 3.30 in FID and sFID across multiple downstream datasets, highlighting its versatility and efficiency.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-09
issn 2331-8422
language eng
recordid cdi_proquest_journals_3111726488
source Freely Accessible Journals
subjects Diffusion rate
Image processing
Parameters
title FINE: Factorizing Knowledge for Initialization of Variable-sized Diffusion Models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-15T22%3A25%3A55IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=FINE:%20Factorizing%20Knowledge%20for%20Initialization%20of%20Variable-sized%20Diffusion%20Models&rft.jtitle=arXiv.org&rft.au=Xie,%20Yucheng&rft.date=2024-09-28&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3111726488%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3111726488&rft_id=info:pmid/&rfr_iscdi=true