QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

The practical deployment of diffusion models still suffers from the high memory and time overhead. While quantization paves a way for compression and acceleration, existing methods unfortunately fail when the models are quantized to low-bits. In this paper, we empirically unravel three properties in...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wang, Haoxuan, Shang, Yuzhang, Yuan, Zhihang, Wu, Junyi, Yan, Junchi, Yan, Yan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wang, Haoxuan Shang, Yuzhang Yuan, Zhihang Wu, Junyi Yan, Junchi Yan, Yan
description	The practical deployment of diffusion models still suffers from the high memory and time overhead. While quantization paves a way for compression and acceleration, existing methods unfortunately fail when the models are quantized to low-bits. In this paper, we empirically unravel three properties in quantized diffusion models that compromise the efficacy of current methods: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules. To alleviate the intensified low-bit quantization difficulty stemming from the distribution imbalance, we propose finetuning the quantized model to better adapt to the activation distribution. Building on this idea, we identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width, and finetune them to mitigate performance degradation with efficiency. We empirically verify that our approach modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization. Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, as well as being the first method to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is available \href{https://github.com/hatchetProject/QuEST}{here}.
doi_str_mv	10.48550/arxiv.2402.03666
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_03666</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_03666</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-237128ebff9798c54e0ca0c7b0635074786e31079193bf11c529c3e903d7da343</originalsourceid><addsrcrecordid>eNotz0tOwzAUhWFPGKDCAhjhDSTY8SvuDJUUkAKoauaR41xXVwoOSp3wWD1qYXSkf3Ckj5AbznJZKsXu3PSFS15IVuRMaK0vyeturvbNmtbjZ9Zhog8YwnzEMdKXsYeB7mYXE_64dEoLOlqFgB4hJrqHAXzCBegWI6Q5YjxckYvghiNc_--KNNuq2Txl9dvj8-a-zpw2OiuE4UUJXQjW2NIrCcw75k3HtFDMSFNqEJwZy63oAudeFdYLsEz0pndCihW5_bs9g9qPCd_d9N2eYO0ZJn4BBfJHmw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning</title><source>arXiv.org</source><creator>Wang, Haoxuan ; Shang, Yuzhang ; Yuan, Zhihang ; Wu, Junyi ; Yan, Junchi ; Yan, Yan</creator><creatorcontrib>Wang, Haoxuan ; Shang, Yuzhang ; Yuan, Zhihang ; Wu, Junyi ; Yan, Junchi ; Yan, Yan</creatorcontrib><description>The practical deployment of diffusion models still suffers from the high memory and time overhead. While quantization paves a way for compression and acceleration, existing methods unfortunately fail when the models are quantized to low-bits. In this paper, we empirically unravel three properties in quantized diffusion models that compromise the efficacy of current methods: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules. To alleviate the intensified low-bit quantization difficulty stemming from the distribution imbalance, we propose finetuning the quantized model to better adapt to the activation distribution. Building on this idea, we identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width, and finetune them to mitigate performance degradation with efficiency. We empirically verify that our approach modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization. Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, as well as being the first method to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is available \href{https://github.com/hatchetProject/QuEST}{here}.</description><identifier>DOI: 10.48550/arxiv.2402.03666</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-02</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.03666$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.03666$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Haoxuan</creatorcontrib><creatorcontrib>Shang, Yuzhang</creatorcontrib><creatorcontrib>Yuan, Zhihang</creatorcontrib><creatorcontrib>Wu, Junyi</creatorcontrib><creatorcontrib>Yan, Junchi</creatorcontrib><creatorcontrib>Yan, Yan</creatorcontrib><title>QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning</title><description>The practical deployment of diffusion models still suffers from the high memory and time overhead. While quantization paves a way for compression and acceleration, existing methods unfortunately fail when the models are quantized to low-bits. In this paper, we empirically unravel three properties in quantized diffusion models that compromise the efficacy of current methods: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules. To alleviate the intensified low-bit quantization difficulty stemming from the distribution imbalance, we propose finetuning the quantized model to better adapt to the activation distribution. Building on this idea, we identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width, and finetune them to mitigate performance degradation with efficiency. We empirically verify that our approach modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization. Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, as well as being the first method to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is available \href{https://github.com/hatchetProject/QuEST}{here}.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz0tOwzAUhWFPGKDCAhjhDSTY8SvuDJUUkAKoauaR41xXVwoOSp3wWD1qYXSkf3Ckj5AbznJZKsXu3PSFS15IVuRMaK0vyeturvbNmtbjZ9Zhog8YwnzEMdKXsYeB7mYXE_64dEoLOlqFgB4hJrqHAXzCBegWI6Q5YjxckYvghiNc_--KNNuq2Txl9dvj8-a-zpw2OiuE4UUJXQjW2NIrCcw75k3HtFDMSFNqEJwZy63oAudeFdYLsEz0pndCihW5_bs9g9qPCd_d9N2eYO0ZJn4BBfJHmw</recordid><startdate>20240205</startdate><enddate>20240205</enddate><creator>Wang, Haoxuan</creator><creator>Shang, Yuzhang</creator><creator>Yuan, Zhihang</creator><creator>Wu, Junyi</creator><creator>Yan, Junchi</creator><creator>Yan, Yan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240205</creationdate><title>QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning</title><author>Wang, Haoxuan ; Shang, Yuzhang ; Yuan, Zhihang ; Wu, Junyi ; Yan, Junchi ; Yan, Yan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-237128ebff9798c54e0ca0c7b0635074786e31079193bf11c529c3e903d7da343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Haoxuan</creatorcontrib><creatorcontrib>Shang, Yuzhang</creatorcontrib><creatorcontrib>Yuan, Zhihang</creatorcontrib><creatorcontrib>Wu, Junyi</creatorcontrib><creatorcontrib>Yan, Junchi</creatorcontrib><creatorcontrib>Yan, Yan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Haoxuan</au><au>Shang, Yuzhang</au><au>Yuan, Zhihang</au><au>Wu, Junyi</au><au>Yan, Junchi</au><au>Yan, Yan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning</atitle><date>2024-02-05</date><risdate>2024</risdate><abstract>The practical deployment of diffusion models still suffers from the high memory and time overhead. While quantization paves a way for compression and acceleration, existing methods unfortunately fail when the models are quantized to low-bits. In this paper, we empirically unravel three properties in quantized diffusion models that compromise the efficacy of current methods: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules. To alleviate the intensified low-bit quantization difficulty stemming from the distribution imbalance, we propose finetuning the quantized model to better adapt to the activation distribution. Building on this idea, we identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width, and finetune them to mitigate performance degradation with efficiency. We empirically verify that our approach modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization. Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, as well as being the first method to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is available \href{https://github.com/hatchetProject/QuEST}{here}.</abstract><doi>10.48550/arxiv.2402.03666</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2402.03666
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2402_03666
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T08%3A23%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=QuEST:%20Low-bit%20Diffusion%20Model%20Quantization%20via%20Efficient%20Selective%20Finetuning&rft.au=Wang,%20Haoxuan&rft.date=2024-02-05&rft_id=info:doi/10.48550/arxiv.2402.03666&rft_dat=%3Carxiv_GOX%3E2402_03666%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true