Tackling the Generative Learning Trilemma with Denoising Diffusion GANs

A wide variety of deep generative models has been developed in the past decade. Yet, these models often struggle with simultaneously addressing three key requirements including: high sample quality, mode coverage, and fast sampling. We call the challenge imposed by these requirements the generative...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Xiao, Zhisheng, Kreis, Karsten, Vahdat, Arash
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Xiao, Zhisheng
Kreis, Karsten
Vahdat, Arash
description A wide variety of deep generative models has been developed in the past decade. Yet, these models often struggle with simultaneously addressing three key requirements including: high sample quality, mode coverage, and fast sampling. We call the challenge imposed by these requirements the generative learning trilemma, as the existing models often trade some of them for others. Particularly, denoising diffusion models have shown impressive sample quality and diversity, but their expensive sampling does not yet allow them to be applied in many real-world applications. In this paper, we argue that slow sampling in these models is fundamentally attributed to the Gaussian assumption in the denoising step which is justified only for small step sizes. To enable denoising with large steps, and hence, to reduce the total number of denoising steps, we propose to model the denoising distribution using a complex multimodal distribution. We introduce denoising diffusion generative adversarial networks (denoising diffusion GANs) that model each denoising step using a multimodal conditional GAN. Through extensive evaluations, we show that denoising diffusion GANs obtain sample quality and diversity competitive with original diffusion models while being 2000$\times$ faster on the CIFAR-10 dataset. Compared to traditional GANs, our model exhibits better mode coverage and sample diversity. To the best of our knowledge, denoising diffusion GAN is the first model that reduces sampling cost in diffusion models to an extent that allows them to be applied to real-world applications inexpensively. Project page and code can be found at https://nvlabs.github.io/denoising-diffusion-gan
doi_str_mv 10.48550/arxiv.2112.07804
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2112_07804</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2112_07804</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-9c73df60561cb5190cd0e6bd317c262a796b06d9e28770968354495fd486feac3</originalsourceid><addsrcrecordid>eNotz71OwzAYhWEvDKhwAUz4BhL8_zNWLQSkqCzZI8f-TC0SFzmhwN2jFKYjvcORHoTuKKmFkZI8uPKdzjWjlNVEGyKuUdM5_z6m_IaXI-AGMhS3pDPgFlzJa-9KGmGaHP5KyxHvIZ_SvPZ9ivFzTqeMm-1hvkFX0Y0z3P7vBnVPj93uuWpfm5fdtq2c0qKyXvMQFZGK-kFSS3wgoIbAqfZMMaetGogKFpjRmlhluBTCyhiEURGc5xt0_3d7kfQfJU2u_PSrqL-I-C_9HkV8</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Tackling the Generative Learning Trilemma with Denoising Diffusion GANs</title><source>arXiv.org</source><creator>Xiao, Zhisheng ; Kreis, Karsten ; Vahdat, Arash</creator><creatorcontrib>Xiao, Zhisheng ; Kreis, Karsten ; Vahdat, Arash</creatorcontrib><description>A wide variety of deep generative models has been developed in the past decade. Yet, these models often struggle with simultaneously addressing three key requirements including: high sample quality, mode coverage, and fast sampling. We call the challenge imposed by these requirements the generative learning trilemma, as the existing models often trade some of them for others. Particularly, denoising diffusion models have shown impressive sample quality and diversity, but their expensive sampling does not yet allow them to be applied in many real-world applications. In this paper, we argue that slow sampling in these models is fundamentally attributed to the Gaussian assumption in the denoising step which is justified only for small step sizes. To enable denoising with large steps, and hence, to reduce the total number of denoising steps, we propose to model the denoising distribution using a complex multimodal distribution. We introduce denoising diffusion generative adversarial networks (denoising diffusion GANs) that model each denoising step using a multimodal conditional GAN. Through extensive evaluations, we show that denoising diffusion GANs obtain sample quality and diversity competitive with original diffusion models while being 2000$\times$ faster on the CIFAR-10 dataset. Compared to traditional GANs, our model exhibits better mode coverage and sample diversity. To the best of our knowledge, denoising diffusion GAN is the first model that reduces sampling cost in diffusion models to an extent that allows them to be applied to real-world applications inexpensively. Project page and code can be found at https://nvlabs.github.io/denoising-diffusion-gan</description><identifier>DOI: 10.48550/arxiv.2112.07804</identifier><language>eng</language><subject>Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2021-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2112.07804$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2112.07804$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Xiao, Zhisheng</creatorcontrib><creatorcontrib>Kreis, Karsten</creatorcontrib><creatorcontrib>Vahdat, Arash</creatorcontrib><title>Tackling the Generative Learning Trilemma with Denoising Diffusion GANs</title><description>A wide variety of deep generative models has been developed in the past decade. Yet, these models often struggle with simultaneously addressing three key requirements including: high sample quality, mode coverage, and fast sampling. We call the challenge imposed by these requirements the generative learning trilemma, as the existing models often trade some of them for others. Particularly, denoising diffusion models have shown impressive sample quality and diversity, but their expensive sampling does not yet allow them to be applied in many real-world applications. In this paper, we argue that slow sampling in these models is fundamentally attributed to the Gaussian assumption in the denoising step which is justified only for small step sizes. To enable denoising with large steps, and hence, to reduce the total number of denoising steps, we propose to model the denoising distribution using a complex multimodal distribution. We introduce denoising diffusion generative adversarial networks (denoising diffusion GANs) that model each denoising step using a multimodal conditional GAN. Through extensive evaluations, we show that denoising diffusion GANs obtain sample quality and diversity competitive with original diffusion models while being 2000$\times$ faster on the CIFAR-10 dataset. Compared to traditional GANs, our model exhibits better mode coverage and sample diversity. To the best of our knowledge, denoising diffusion GAN is the first model that reduces sampling cost in diffusion models to an extent that allows them to be applied to real-world applications inexpensively. Project page and code can be found at https://nvlabs.github.io/denoising-diffusion-gan</description><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz71OwzAYhWEvDKhwAUz4BhL8_zNWLQSkqCzZI8f-TC0SFzmhwN2jFKYjvcORHoTuKKmFkZI8uPKdzjWjlNVEGyKuUdM5_z6m_IaXI-AGMhS3pDPgFlzJa-9KGmGaHP5KyxHvIZ_SvPZ9ivFzTqeMm-1hvkFX0Y0z3P7vBnVPj93uuWpfm5fdtq2c0qKyXvMQFZGK-kFSS3wgoIbAqfZMMaetGogKFpjRmlhluBTCyhiEURGc5xt0_3d7kfQfJU2u_PSrqL-I-C_9HkV8</recordid><startdate>20211214</startdate><enddate>20211214</enddate><creator>Xiao, Zhisheng</creator><creator>Kreis, Karsten</creator><creator>Vahdat, Arash</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20211214</creationdate><title>Tackling the Generative Learning Trilemma with Denoising Diffusion GANs</title><author>Xiao, Zhisheng ; Kreis, Karsten ; Vahdat, Arash</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-9c73df60561cb5190cd0e6bd317c262a796b06d9e28770968354495fd486feac3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiao, Zhisheng</creatorcontrib><creatorcontrib>Kreis, Karsten</creatorcontrib><creatorcontrib>Vahdat, Arash</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiao, Zhisheng</au><au>Kreis, Karsten</au><au>Vahdat, Arash</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Tackling the Generative Learning Trilemma with Denoising Diffusion GANs</atitle><date>2021-12-14</date><risdate>2021</risdate><abstract>A wide variety of deep generative models has been developed in the past decade. Yet, these models often struggle with simultaneously addressing three key requirements including: high sample quality, mode coverage, and fast sampling. We call the challenge imposed by these requirements the generative learning trilemma, as the existing models often trade some of them for others. Particularly, denoising diffusion models have shown impressive sample quality and diversity, but their expensive sampling does not yet allow them to be applied in many real-world applications. In this paper, we argue that slow sampling in these models is fundamentally attributed to the Gaussian assumption in the denoising step which is justified only for small step sizes. To enable denoising with large steps, and hence, to reduce the total number of denoising steps, we propose to model the denoising distribution using a complex multimodal distribution. We introduce denoising diffusion generative adversarial networks (denoising diffusion GANs) that model each denoising step using a multimodal conditional GAN. Through extensive evaluations, we show that denoising diffusion GANs obtain sample quality and diversity competitive with original diffusion models while being 2000$\times$ faster on the CIFAR-10 dataset. Compared to traditional GANs, our model exhibits better mode coverage and sample diversity. To the best of our knowledge, denoising diffusion GAN is the first model that reduces sampling cost in diffusion models to an extent that allows them to be applied to real-world applications inexpensively. Project page and code can be found at https://nvlabs.github.io/denoising-diffusion-gan</abstract><doi>10.48550/arxiv.2112.07804</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2112.07804
ispartof
issn
language eng
recordid cdi_arxiv_primary_2112_07804
source arXiv.org
subjects Computer Science - Learning
Statistics - Machine Learning
title Tackling the Generative Learning Trilemma with Denoising Diffusion GANs
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T03%3A45%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Tackling%20the%20Generative%20Learning%20Trilemma%20with%20Denoising%20Diffusion%20GANs&rft.au=Xiao,%20Zhisheng&rft.date=2021-12-14&rft_id=info:doi/10.48550/arxiv.2112.07804&rft_dat=%3Carxiv_GOX%3E2112_07804%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true