QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
The practical deployment of diffusion models still suffers from the high memory and time overhead. While quantization paves a way for compression and acceleration, existing methods unfortunately fail when the models are quantized to low-bits. In this paper, we empirically unravel three properties in...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Wang, Haoxuan Shang, Yuzhang Yuan, Zhihang Wu, Junyi Yan, Junchi Yan, Yan |
description | The practical deployment of diffusion models still suffers from the high
memory and time overhead. While quantization paves a way for compression and
acceleration, existing methods unfortunately fail when the models are quantized
to low-bits. In this paper, we empirically unravel three properties in
quantized diffusion models that compromise the efficacy of current methods:
imbalanced activation distributions, imprecise temporal information, and
vulnerability to perturbations of specific modules. To alleviate the
intensified low-bit quantization difficulty stemming from the distribution
imbalance, we propose finetuning the quantized model to better adapt to the
activation distribution. Building on this idea, we identify two critical types
of quantized layers: those holding vital temporal information and those
sensitive to reduced bit-width, and finetune them to mitigate performance
degradation with efficiency. We empirically verify that our approach modifies
the activation distribution and provides meaningful temporal information,
facilitating easier and more accurate quantization. Our method is evaluated
over three high-resolution image generation tasks and achieves state-of-the-art
performance under various bit-width settings, as well as being the first method
to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is
available \href{https://github.com/hatchetProject/QuEST}{here}. |
doi_str_mv | 10.48550/arxiv.2402.03666 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_03666</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_03666</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-237128ebff9798c54e0ca0c7b0635074786e31079193bf11c529c3e903d7da343</originalsourceid><addsrcrecordid>eNotz0tOwzAUhWFPGKDCAhjhDSTY8SvuDJUUkAKoauaR41xXVwoOSp3wWD1qYXSkf3Ckj5AbznJZKsXu3PSFS15IVuRMaK0vyeturvbNmtbjZ9Zhog8YwnzEMdKXsYeB7mYXE_64dEoLOlqFgB4hJrqHAXzCBegWI6Q5YjxckYvghiNc_--KNNuq2Txl9dvj8-a-zpw2OiuE4UUJXQjW2NIrCcw75k3HtFDMSFNqEJwZy63oAudeFdYLsEz0pndCihW5_bs9g9qPCd_d9N2eYO0ZJn4BBfJHmw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning</title><source>arXiv.org</source><creator>Wang, Haoxuan ; Shang, Yuzhang ; Yuan, Zhihang ; Wu, Junyi ; Yan, Junchi ; Yan, Yan</creator><creatorcontrib>Wang, Haoxuan ; Shang, Yuzhang ; Yuan, Zhihang ; Wu, Junyi ; Yan, Junchi ; Yan, Yan</creatorcontrib><description>The practical deployment of diffusion models still suffers from the high
memory and time overhead. While quantization paves a way for compression and
acceleration, existing methods unfortunately fail when the models are quantized
to low-bits. In this paper, we empirically unravel three properties in
quantized diffusion models that compromise the efficacy of current methods:
imbalanced activation distributions, imprecise temporal information, and
vulnerability to perturbations of specific modules. To alleviate the
intensified low-bit quantization difficulty stemming from the distribution
imbalance, we propose finetuning the quantized model to better adapt to the
activation distribution. Building on this idea, we identify two critical types
of quantized layers: those holding vital temporal information and those
sensitive to reduced bit-width, and finetune them to mitigate performance
degradation with efficiency. We empirically verify that our approach modifies
the activation distribution and provides meaningful temporal information,
facilitating easier and more accurate quantization. Our method is evaluated
over three high-resolution image generation tasks and achieves state-of-the-art
performance under various bit-width settings, as well as being the first method
to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is
available \href{https://github.com/hatchetProject/QuEST}{here}.</description><identifier>DOI: 10.48550/arxiv.2402.03666</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-02</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.03666$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.03666$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wang, Haoxuan</creatorcontrib><creatorcontrib>Shang, Yuzhang</creatorcontrib><creatorcontrib>Yuan, Zhihang</creatorcontrib><creatorcontrib>Wu, Junyi</creatorcontrib><creatorcontrib>Yan, Junchi</creatorcontrib><creatorcontrib>Yan, Yan</creatorcontrib><title>QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning</title><description>The practical deployment of diffusion models still suffers from the high
memory and time overhead. While quantization paves a way for compression and
acceleration, existing methods unfortunately fail when the models are quantized
to low-bits. In this paper, we empirically unravel three properties in
quantized diffusion models that compromise the efficacy of current methods:
imbalanced activation distributions, imprecise temporal information, and
vulnerability to perturbations of specific modules. To alleviate the
intensified low-bit quantization difficulty stemming from the distribution
imbalance, we propose finetuning the quantized model to better adapt to the
activation distribution. Building on this idea, we identify two critical types
of quantized layers: those holding vital temporal information and those
sensitive to reduced bit-width, and finetune them to mitigate performance
degradation with efficiency. We empirically verify that our approach modifies
the activation distribution and provides meaningful temporal information,
facilitating easier and more accurate quantization. Our method is evaluated
over three high-resolution image generation tasks and achieves state-of-the-art
performance under various bit-width settings, as well as being the first method
to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is
available \href{https://github.com/hatchetProject/QuEST}{here}.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz0tOwzAUhWFPGKDCAhjhDSTY8SvuDJUUkAKoauaR41xXVwoOSp3wWD1qYXSkf3Ckj5AbznJZKsXu3PSFS15IVuRMaK0vyeturvbNmtbjZ9Zhog8YwnzEMdKXsYeB7mYXE_64dEoLOlqFgB4hJrqHAXzCBegWI6Q5YjxckYvghiNc_--KNNuq2Txl9dvj8-a-zpw2OiuE4UUJXQjW2NIrCcw75k3HtFDMSFNqEJwZy63oAudeFdYLsEz0pndCihW5_bs9g9qPCd_d9N2eYO0ZJn4BBfJHmw</recordid><startdate>20240205</startdate><enddate>20240205</enddate><creator>Wang, Haoxuan</creator><creator>Shang, Yuzhang</creator><creator>Yuan, Zhihang</creator><creator>Wu, Junyi</creator><creator>Yan, Junchi</creator><creator>Yan, Yan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240205</creationdate><title>QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning</title><author>Wang, Haoxuan ; Shang, Yuzhang ; Yuan, Zhihang ; Wu, Junyi ; Yan, Junchi ; Yan, Yan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-237128ebff9798c54e0ca0c7b0635074786e31079193bf11c529c3e903d7da343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Wang, Haoxuan</creatorcontrib><creatorcontrib>Shang, Yuzhang</creatorcontrib><creatorcontrib>Yuan, Zhihang</creatorcontrib><creatorcontrib>Wu, Junyi</creatorcontrib><creatorcontrib>Yan, Junchi</creatorcontrib><creatorcontrib>Yan, Yan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wang, Haoxuan</au><au>Shang, Yuzhang</au><au>Yuan, Zhihang</au><au>Wu, Junyi</au><au>Yan, Junchi</au><au>Yan, Yan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning</atitle><date>2024-02-05</date><risdate>2024</risdate><abstract>The practical deployment of diffusion models still suffers from the high
memory and time overhead. While quantization paves a way for compression and
acceleration, existing methods unfortunately fail when the models are quantized
to low-bits. In this paper, we empirically unravel three properties in
quantized diffusion models that compromise the efficacy of current methods:
imbalanced activation distributions, imprecise temporal information, and
vulnerability to perturbations of specific modules. To alleviate the
intensified low-bit quantization difficulty stemming from the distribution
imbalance, we propose finetuning the quantized model to better adapt to the
activation distribution. Building on this idea, we identify two critical types
of quantized layers: those holding vital temporal information and those
sensitive to reduced bit-width, and finetune them to mitigate performance
degradation with efficiency. We empirically verify that our approach modifies
the activation distribution and provides meaningful temporal information,
facilitating easier and more accurate quantization. Our method is evaluated
over three high-resolution image generation tasks and achieves state-of-the-art
performance under various bit-width settings, as well as being the first method
to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is
available \href{https://github.com/hatchetProject/QuEST}{here}.</abstract><doi>10.48550/arxiv.2402.03666</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2402.03666 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2402_03666 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition |
title | QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T08%3A23%3A05IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=QuEST:%20Low-bit%20Diffusion%20Model%20Quantization%20via%20Efficient%20Selective%20Finetuning&rft.au=Wang,%20Haoxuan&rft.date=2024-02-05&rft_id=info:doi/10.48550/arxiv.2402.03666&rft_dat=%3Carxiv_GOX%3E2402_03666%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |