QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning

The practical deployment of diffusion models still suffers from the high memory and time overhead. While quantization paves a way for compression and acceleration, existing methods unfortunately fail when the models are quantized to low-bits. In this paper, we empirically unravel three properties in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wang, Haoxuan, Shang, Yuzhang, Yuan, Zhihang, Wu, Junyi, Yan, Junchi, Yan, Yan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Wang, Haoxuan
Shang, Yuzhang
Yuan, Zhihang
Wu, Junyi
Yan, Junchi
Yan, Yan
description The practical deployment of diffusion models still suffers from the high memory and time overhead. While quantization paves a way for compression and acceleration, existing methods unfortunately fail when the models are quantized to low-bits. In this paper, we empirically unravel three properties in quantized diffusion models that compromise the efficacy of current methods: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules. To alleviate the intensified low-bit quantization difficulty stemming from the distribution imbalance, we propose finetuning the quantized model to better adapt to the activation distribution. Building on this idea, we identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width, and finetune them to mitigate performance degradation with efficiency. We empirically verify that our approach modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization. Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, as well as being the first method to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is available \href{https://github.com/hatchetProject/QuEST}{here}.
doi_str_mv 10.48550/arxiv.2402.03666
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_03666</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_03666</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-237128ebff9798c54e0ca0c7b0635074786e31079193bf11c529c3e903d7da343</originalsourceid><addsrcrecordid>eNotz0tOwzAUhWFPGKDCAhjhDSTY8SvuDJUUkAKoauaR41xXVwoOSp3wWD1qYXSkf3Ckj5AbznJZKsXu3PSFS15IVuRMaK0vyeturvbNmtbjZ9Zhog8YwnzEMdKXsYeB7mYXE_64dEoLOlqFgB4hJrqHAXzCBegWI6Q5YjxckYvghiNc_--KNNuq2Txl9dvj8-a-zpw2OiuE4UUJXQjW2NIrCcw75k3HtFDMSFNqEJwZy63oAudeFdYLsEz0pndCihW5_bs9g9qPCd_d9N2eYO0ZJn4BBfJHmw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning</title><source>arXiv.org</source><creator>Wang, Haoxuan ; Shang, Yuzhang ; Yuan, Zhihang ; Wu, Junyi ; Yan, Junchi ; Yan, Yan</creator><creatorcontrib>Wang, Haoxuan ; Shang, Yuzhang ; Yuan, Zhihang ; Wu, Junyi ; Yan, Junchi ; Yan, Yan</creatorcontrib><description>The practical deployment of diffusion models still suffers from the high memory and time overhead. While quantization paves a way for compression and acceleration, existing methods unfortunately fail when the models are quantized to low-bits. In this paper, we empirically unravel three properties in quantized diffusion models that compromise the efficacy of current methods: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules. To alleviate the intensified low-bit quantization difficulty stemming from the distribution imbalance, we propose finetuning the quantized model to better adapt to the activation distribution. Building on this idea, we identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width, and finetune them to mitigate performance degradation with efficiency. We empirically verify that our approach modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization. Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, as well as being the first method to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is available \href{https://github.com/hatchetProject/QuEST}{here}.</description><identifier>DOI: 10.48550/arxiv.2402.03666</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-02</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.03666$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.03666$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Haoxuan</creatorcontrib><creatorcontrib>Shang, Yuzhang</creatorcontrib><creatorcontrib>Yuan, Zhihang</creatorcontrib><creatorcontrib>Wu, Junyi</creatorcontrib><creatorcontrib>Yan, Junchi</creatorcontrib><creatorcontrib>Yan, Yan</creatorcontrib><title>QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning</title><description>The practical deployment of diffusion models still suffers from the high memory and time overhead. While quantization paves a way for compression and acceleration, existing methods unfortunately fail when the models are quantized to low-bits. In this paper, we empirically unravel three properties in quantized diffusion models that compromise the efficacy of current methods: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules. To alleviate the intensified low-bit quantization difficulty stemming from the distribution imbalance, we propose finetuning the quantized model to better adapt to the activation distribution. Building on this idea, we identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width, and finetune them to mitigate performance degradation with efficiency. We empirically verify that our approach modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization. Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, as well as being the first method to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is available \href{https://github.com/hatchetProject/QuEST}{here}.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz0tOwzAUhWFPGKDCAhjhDSTY8SvuDJUUkAKoauaR41xXVwoOSp3wWD1qYXSkf3Ckj5AbznJZKsXu3PSFS15IVuRMaK0vyeturvbNmtbjZ9Zhog8YwnzEMdKXsYeB7mYXE_64dEoLOlqFgB4hJrqHAXzCBegWI6Q5YjxckYvghiNc_--KNNuq2Txl9dvj8-a-zpw2OiuE4UUJXQjW2NIrCcw75k3HtFDMSFNqEJwZy63oAudeFdYLsEz0pndCihW5_bs9g9qPCd_d9N2eYO0ZJn4BBfJHmw</recordid><startdate>20240205</startdate><enddate>20240205</enddate><creator>Wang, Haoxuan</creator><creator>Shang, Yuzhang</creator><creator>Yuan, Zhihang</creator><creator>Wu, Junyi</creator><creator>Yan, Junchi</creator><creator>Yan, Yan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240205</creationdate><title>QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning</title><author>Wang, Haoxuan ; Shang, Yuzhang ; Yuan, Zhihang ; Wu, Junyi ; Yan, Junchi ; Yan, Yan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-237128ebff9798c54e0ca0c7b0635074786e31079193bf11c529c3e903d7da343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Haoxuan</creatorcontrib><creatorcontrib>Shang, Yuzhang</creatorcontrib><creatorcontrib>Yuan, Zhihang</creatorcontrib><creatorcontrib>Wu, Junyi</creatorcontrib><creatorcontrib>Yan, Junchi</creatorcontrib><creatorcontrib>Yan, Yan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Haoxuan</au><au>Shang, Yuzhang</au><au>Yuan, Zhihang</au><au>Wu, Junyi</au><au>Yan, Junchi</au><au>Yan, Yan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning</atitle><date>2024-02-05</date><risdate>2024</risdate><abstract>The practical deployment of diffusion models still suffers from the high memory and time overhead. While quantization paves a way for compression and acceleration, existing methods unfortunately fail when the models are quantized to low-bits. In this paper, we empirically unravel three properties in quantized diffusion models that compromise the efficacy of current methods: imbalanced activation distributions, imprecise temporal information, and vulnerability to perturbations of specific modules. To alleviate the intensified low-bit quantization difficulty stemming from the distribution imbalance, we propose finetuning the quantized model to better adapt to the activation distribution. Building on this idea, we identify two critical types of quantized layers: those holding vital temporal information and those sensitive to reduced bit-width, and finetune them to mitigate performance degradation with efficiency. We empirically verify that our approach modifies the activation distribution and provides meaningful temporal information, facilitating easier and more accurate quantization. Our method is evaluated over three high-resolution image generation tasks and achieves state-of-the-art performance under various bit-width settings, as well as being the first method to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is available \href{https://github.com/hatchetProject/QuEST}{here}.</abstract><doi>10.48550/arxiv.2402.03666</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2402.03666
ispartof
issn
language eng
recordid cdi_arxiv_primary_2402_03666
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T08%3A23%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=QuEST:%20Low-bit%20Diffusion%20Model%20Quantization%20via%20Efficient%20Selective%20Finetuning&rft.au=Wang,%20Haoxuan&rft.date=2024-02-05&rft_id=info:doi/10.48550/arxiv.2402.03666&rft_dat=%3Carxiv_GOX%3E2402_03666%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true