An Analysis on Quantizing Diffusion Transformers
Diffusion Models (DMs) utilize an iterative denoising process to transform random noise into synthetic data. Initally proposed with a UNet structure, DMs excel at producing images that are virtually indistinguishable with or without conditioned text prompts. Later transformer-only structure is compo...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Yang, Yuewei Wang, Jialiang Dai, Xiaoliang Zhang, Peizhao Zhang, Hongbo |
description | Diffusion Models (DMs) utilize an iterative denoising process to transform
random noise into synthetic data. Initally proposed with a UNet structure, DMs
excel at producing images that are virtually indistinguishable with or without
conditioned text prompts. Later transformer-only structure is composed with DMs
to achieve better performance. Though Latent Diffusion Models (LDMs) reduce the
computational requirement by denoising in a latent space, it is extremely
expensive to inference images for any operating devices due to the shear volume
of parameters and feature sizes. Post Training Quantization (PTQ) offers an
immediate remedy for a smaller storage size and more memory-efficient
computation during inferencing. Prior works address PTQ of DMs on UNet
structures have addressed the challenges in calibrating parameters for both
activations and weights via moderate optimization. In this work, we pioneer an
efficient PTQ on transformer-only structure without any optimization. By
analysing challenges in quantizing activations and weights for diffusion
transformers, we propose a single-step sampling calibration on activations and
adapt group-wise quantization on weights for low-bit quantization. We
demonstrate the efficiency and effectiveness of proposed methods with
preliminary experiments on conditional image generation. |
doi_str_mv | 10.48550/arxiv.2406.11100 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2406_11100</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2406_11100</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-339dc2aead1743b6d3b8d7f325ace498909794398f3b6959c5459683a30f75b13</originalsourceid><addsrcrecordid>eNotzs1qAjEUBeBsXBTtA3TlvMCMN3OTSe5ysH-CIMLshzs_kYBGSarUPn2tdXXgHDh8QrxIKJTVGhYcv_2lKBVUhZQS4ElAHbI68P6afMqOIdueOXz5Hx922at37pz8rWwih-SO8TDGNBMTx_s0Pj9yKpr3t2b5ma83H6tlvc65MpAj0tCXPPIgjcKuGrCzg3FYau5HRZaADCkk624jaeq10lRZZARndCdxKub_t3dye4r-wPHa_tHbOx1_ARlcPUA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>An Analysis on Quantizing Diffusion Transformers</title><source>arXiv.org</source><creator>Yang, Yuewei ; Wang, Jialiang ; Dai, Xiaoliang ; Zhang, Peizhao ; Zhang, Hongbo</creator><creatorcontrib>Yang, Yuewei ; Wang, Jialiang ; Dai, Xiaoliang ; Zhang, Peizhao ; Zhang, Hongbo</creatorcontrib><description>Diffusion Models (DMs) utilize an iterative denoising process to transform
random noise into synthetic data. Initally proposed with a UNet structure, DMs
excel at producing images that are virtually indistinguishable with or without
conditioned text prompts. Later transformer-only structure is composed with DMs
to achieve better performance. Though Latent Diffusion Models (LDMs) reduce the
computational requirement by denoising in a latent space, it is extremely
expensive to inference images for any operating devices due to the shear volume
of parameters and feature sizes. Post Training Quantization (PTQ) offers an
immediate remedy for a smaller storage size and more memory-efficient
computation during inferencing. Prior works address PTQ of DMs on UNet
structures have addressed the challenges in calibrating parameters for both
activations and weights via moderate optimization. In this work, we pioneer an
efficient PTQ on transformer-only structure without any optimization. By
analysing challenges in quantizing activations and weights for diffusion
transformers, we propose a single-step sampling calibration on activations and
adapt group-wise quantization on weights for low-bit quantization. We
demonstrate the efficiency and effectiveness of proposed methods with
preliminary experiments on conditional image generation.</description><identifier>DOI: 10.48550/arxiv.2406.11100</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-06</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2406.11100$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2406.11100$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Yang, Yuewei</creatorcontrib><creatorcontrib>Wang, Jialiang</creatorcontrib><creatorcontrib>Dai, Xiaoliang</creatorcontrib><creatorcontrib>Zhang, Peizhao</creatorcontrib><creatorcontrib>Zhang, Hongbo</creatorcontrib><title>An Analysis on Quantizing Diffusion Transformers</title><description>Diffusion Models (DMs) utilize an iterative denoising process to transform
random noise into synthetic data. Initally proposed with a UNet structure, DMs
excel at producing images that are virtually indistinguishable with or without
conditioned text prompts. Later transformer-only structure is composed with DMs
to achieve better performance. Though Latent Diffusion Models (LDMs) reduce the
computational requirement by denoising in a latent space, it is extremely
expensive to inference images for any operating devices due to the shear volume
of parameters and feature sizes. Post Training Quantization (PTQ) offers an
immediate remedy for a smaller storage size and more memory-efficient
computation during inferencing. Prior works address PTQ of DMs on UNet
structures have addressed the challenges in calibrating parameters for both
activations and weights via moderate optimization. In this work, we pioneer an
efficient PTQ on transformer-only structure without any optimization. By
analysing challenges in quantizing activations and weights for diffusion
transformers, we propose a single-step sampling calibration on activations and
adapt group-wise quantization on weights for low-bit quantization. We
demonstrate the efficiency and effectiveness of proposed methods with
preliminary experiments on conditional image generation.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzs1qAjEUBeBsXBTtA3TlvMCMN3OTSe5ysH-CIMLshzs_kYBGSarUPn2tdXXgHDh8QrxIKJTVGhYcv_2lKBVUhZQS4ElAHbI68P6afMqOIdueOXz5Hx922at37pz8rWwih-SO8TDGNBMTx_s0Pj9yKpr3t2b5ma83H6tlvc65MpAj0tCXPPIgjcKuGrCzg3FYau5HRZaADCkk624jaeq10lRZZARndCdxKub_t3dye4r-wPHa_tHbOx1_ARlcPUA</recordid><startdate>20240616</startdate><enddate>20240616</enddate><creator>Yang, Yuewei</creator><creator>Wang, Jialiang</creator><creator>Dai, Xiaoliang</creator><creator>Zhang, Peizhao</creator><creator>Zhang, Hongbo</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240616</creationdate><title>An Analysis on Quantizing Diffusion Transformers</title><author>Yang, Yuewei ; Wang, Jialiang ; Dai, Xiaoliang ; Zhang, Peizhao ; Zhang, Hongbo</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-339dc2aead1743b6d3b8d7f325ace498909794398f3b6959c5459683a30f75b13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Yang, Yuewei</creatorcontrib><creatorcontrib>Wang, Jialiang</creatorcontrib><creatorcontrib>Dai, Xiaoliang</creatorcontrib><creatorcontrib>Zhang, Peizhao</creatorcontrib><creatorcontrib>Zhang, Hongbo</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Yang, Yuewei</au><au>Wang, Jialiang</au><au>Dai, Xiaoliang</au><au>Zhang, Peizhao</au><au>Zhang, Hongbo</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An Analysis on Quantizing Diffusion Transformers</atitle><date>2024-06-16</date><risdate>2024</risdate><abstract>Diffusion Models (DMs) utilize an iterative denoising process to transform
random noise into synthetic data. Initally proposed with a UNet structure, DMs
excel at producing images that are virtually indistinguishable with or without
conditioned text prompts. Later transformer-only structure is composed with DMs
to achieve better performance. Though Latent Diffusion Models (LDMs) reduce the
computational requirement by denoising in a latent space, it is extremely
expensive to inference images for any operating devices due to the shear volume
of parameters and feature sizes. Post Training Quantization (PTQ) offers an
immediate remedy for a smaller storage size and more memory-efficient
computation during inferencing. Prior works address PTQ of DMs on UNet
structures have addressed the challenges in calibrating parameters for both
activations and weights via moderate optimization. In this work, we pioneer an
efficient PTQ on transformer-only structure without any optimization. By
analysing challenges in quantizing activations and weights for diffusion
transformers, we propose a single-step sampling calibration on activations and
adapt group-wise quantization on weights for low-bit quantization. We
demonstrate the efficiency and effectiveness of proposed methods with
preliminary experiments on conditional image generation.</abstract><doi>10.48550/arxiv.2406.11100</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2406.11100 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2406_11100 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition |
title | An Analysis on Quantizing Diffusion Transformers |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T07%3A11%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20Analysis%20on%20Quantizing%20Diffusion%20Transformers&rft.au=Yang,%20Yuewei&rft.date=2024-06-16&rft_id=info:doi/10.48550/arxiv.2406.11100&rft_dat=%3Carxiv_GOX%3E2406_11100%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |