Consistent123: Improve Consistency for One Image to 3D Object Synthesis
Large image diffusion models enable novel view synthesis with high quality and excellent zero-shot capability. However, such models based on image-to-image translation have no guarantee of view consistency, limiting the performance for downstream tasks like 3D reconstruction and image-to-3D generati...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Large image diffusion models enable novel view synthesis with high quality
and excellent zero-shot capability. However, such models based on
image-to-image translation have no guarantee of view consistency, limiting the
performance for downstream tasks like 3D reconstruction and image-to-3D
generation. To empower consistency, we propose Consistent123 to synthesize
novel views simultaneously by incorporating additional cross-view attention
layers and the shared self-attention mechanism. The proposed attention
mechanism improves the interaction across all synthesized views, as well as the
alignment between the condition view and novel views. In the sampling stage,
such architecture supports simultaneously generating an arbitrary number of
views while training at a fixed length. We also introduce a progressive
classifier-free guidance strategy to achieve the trade-off between texture and
geometry for synthesized object views. Qualitative and quantitative experiments
show that Consistent123 outperforms baselines in view consistency by a large
margin. Furthermore, we demonstrate a significant improvement of Consistent123
on varying downstream tasks, showing its great potential in the 3D generation
field. The project page is available at consistent-123.github.io. |
---|---|
DOI: | 10.48550/arxiv.2310.08092 |