SphereDiffusion: Spherical Geometry-Aware Distortion Resilient Diffusion Model
Controllable spherical panoramic image generation holds substantial applicative potential across a variety of domains.However, it remains a challenging task due to the inherent spherical distortion and geometry characteristics, resulting in low-quality content generation.In this paper, we introduce...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Controllable spherical panoramic image generation holds substantial
applicative potential across a variety of domains.However, it remains a
challenging task due to the inherent spherical distortion and geometry
characteristics, resulting in low-quality content generation.In this paper, we
introduce a novel framework of SphereDiffusion to address these unique
challenges, for better generating high-quality and precisely controllable
spherical panoramic images.For the spherical distortion characteristic, we
embed the semantics of the distorted object with text encoding, then explicitly
construct the relationship with text-object correspondence to better use the
pre-trained knowledge of the planar images.Meanwhile, we employ a deformable
technique to mitigate the semantic deviation in latent space caused by
spherical distortion.For the spherical geometry characteristic, in virtue of
spherical rotation invariance, we improve the data diversity and optimization
objectives in the training process, enabling the model to better learn the
spherical geometry characteristic.Furthermore, we enhance the denoising process
of the diffusion model, enabling it to effectively use the learned geometric
characteristic to ensure the boundary continuity of the generated images.With
these specific techniques, experiments on Structured3D dataset show that
SphereDiffusion significantly improves the quality of controllable spherical
image generation and relatively reduces around 35% FID on average. |
---|---|
DOI: | 10.48550/arxiv.2403.10044 |