PLA4D: Pixel-Level Alignments for Text-to-4D Gaussian Splatting
Previous text-to-4D methods have leveraged multiple Score Distillation Sampling (SDS) techniques, combining motion priors from video-based diffusion models (DMs) with geometric priors from multiview DMs to implicitly guide 4D renderings. However, differences in these priors result in conflicting gra...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Previous text-to-4D methods have leveraged multiple Score Distillation
Sampling (SDS) techniques, combining motion priors from video-based diffusion
models (DMs) with geometric priors from multiview DMs to implicitly guide 4D
renderings. However, differences in these priors result in conflicting gradient
directions during optimization, causing trade-offs between motion fidelity and
geometry accuracy, and requiring substantial optimization time to reconcile the
models. In this paper, we introduce \textbf{P}ixel-\textbf{L}evel
\textbf{A}lignment for text-driven \textbf{4D} Gaussian splatting (PLA4D) to
resolve this motion-geometry conflict. PLA4D provides an anchor reference,
i.e., text-generated video, to align the rendering process conditioned by
different DMs in pixel space. For static alignment, our approach introduces a
focal alignment method and Gaussian-Mesh contrastive learning to iteratively
adjust focal lengths and provide explicit geometric priors at each timestep. At
the dynamic level, a motion alignment technique and T-MV refinement method are
employed to enforce both pose alignment and motion continuity across unknown
viewpoints, ensuring intrinsic geometric consistency across views. With such
pixel-level multi-DM alignment, our PLA4D framework is able to generate 4D
objects with superior geometric, motion, and semantic consistency. Fully
implemented with open-source tools, PLA4D offers an efficient and accessible
solution for high-quality 4D digital content creation with significantly
reduced generation time. |
---|---|
DOI: | 10.48550/arxiv.2405.19957 |