Depth-aware guidance with self-estimated depth representations of diffusion models

Diffusion models have recently shown significant advancement in the generative models with their impressive fidelity and diversity. The success of these models can be often attributed to their use of sampling guidance techniques, such as classifier or classifier-free guidance, which provide effectiv...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2024-09, Vol.153, p.110474, Article 110474
Hauptverfasser: Kim, Gyeongnyeon, Jang, Wooseok, Lee, Gyuseong, Hong, Susung, Seo, Junyoung, Kim, Seungryong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Diffusion models have recently shown significant advancement in the generative models with their impressive fidelity and diversity. The success of these models can be often attributed to their use of sampling guidance techniques, such as classifier or classifier-free guidance, which provide effective mechanisms to trade-off between fidelity and diversity. However, these methods are not capable of guiding a generated image to be aware of its geometric configuration, e.g., depth, which hinders their application to downstream tasks such as scene understanding that require a certain level of depth awareness. To overcome this limitation, we propose a novel sampling guidance method for diffusion models that uses self-predicted depth information derived from the rich intermediate representations of diffusion models. Concretely, we first present a label-efficient depth estimation framework using internal representations of diffusion models. Subsequently, we propose the incorporation of two guidance techniques during the sampling phase. These methods involve using pseudo-labeling and depth-domain diffusion prior to self-condition the generated image using the estimated depth map. Experiments and comprehensive ablation studies demonstrate the effectiveness of our method in guiding the diffusion models toward the generation of geometrically plausible images. Our project page is available at https://ku-cvlab.github.io/DAG/. •Prove that depth information is in learned U-Net diffusion models.•Train depth predictors in U-Net efficiently using pretrained models•A novel framework injects depth awareness on images from diffusion model.•New guidance methods apply consistency regularization and learned from depth map prior.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2024.110474