QuEST: Low-bit Diffusion Model Quantization via Efficient Selective Finetuning
The practical deployment of diffusion models still suffers from the high memory and time overhead. While quantization paves a way for compression and acceleration, existing methods unfortunately fail when the models are quantized to low-bits. In this paper, we empirically unravel three properties in...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The practical deployment of diffusion models still suffers from the high
memory and time overhead. While quantization paves a way for compression and
acceleration, existing methods unfortunately fail when the models are quantized
to low-bits. In this paper, we empirically unravel three properties in
quantized diffusion models that compromise the efficacy of current methods:
imbalanced activation distributions, imprecise temporal information, and
vulnerability to perturbations of specific modules. To alleviate the
intensified low-bit quantization difficulty stemming from the distribution
imbalance, we propose finetuning the quantized model to better adapt to the
activation distribution. Building on this idea, we identify two critical types
of quantized layers: those holding vital temporal information and those
sensitive to reduced bit-width, and finetune them to mitigate performance
degradation with efficiency. We empirically verify that our approach modifies
the activation distribution and provides meaningful temporal information,
facilitating easier and more accurate quantization. Our method is evaluated
over three high-resolution image generation tasks and achieves state-of-the-art
performance under various bit-width settings, as well as being the first method
to generate readable images on full 4-bit (i.e. W4A4) Stable Diffusion. Code is
available \href{https://github.com/hatchetProject/QuEST}{here}. |
---|---|
DOI: | 10.48550/arxiv.2402.03666 |