An Analysis on Quantizing Diffusion Transformers
Diffusion Models (DMs) utilize an iterative denoising process to transform random noise into synthetic data. Initally proposed with a UNet structure, DMs excel at producing images that are virtually indistinguishable with or without conditioned text prompts. Later transformer-only structure is compo...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Diffusion Models (DMs) utilize an iterative denoising process to transform
random noise into synthetic data. Initally proposed with a UNet structure, DMs
excel at producing images that are virtually indistinguishable with or without
conditioned text prompts. Later transformer-only structure is composed with DMs
to achieve better performance. Though Latent Diffusion Models (LDMs) reduce the
computational requirement by denoising in a latent space, it is extremely
expensive to inference images for any operating devices due to the shear volume
of parameters and feature sizes. Post Training Quantization (PTQ) offers an
immediate remedy for a smaller storage size and more memory-efficient
computation during inferencing. Prior works address PTQ of DMs on UNet
structures have addressed the challenges in calibrating parameters for both
activations and weights via moderate optimization. In this work, we pioneer an
efficient PTQ on transformer-only structure without any optimization. By
analysing challenges in quantizing activations and weights for diffusion
transformers, we propose a single-step sampling calibration on activations and
adapt group-wise quantization on weights for low-bit quantization. We
demonstrate the efficiency and effectiveness of proposed methods with
preliminary experiments on conditional image generation. |
---|---|
DOI: | 10.48550/arxiv.2406.11100 |