Fast 3D Stylized Gaussian Portrait Generation From a Single Image With Style Aligned Sampling Loss

Creating stylized 3D avatars and portraits from just a single image input is an emerging challenge in augmented and virtual reality. While prior work has explored 2D stylization or 3D avatar generation, achieving high-fidelity 3D stylized portraits with text control remains an open problem. In this...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.58651-58660
Hauptverfasser:	Jiang, Shangming, Yu, Xinyou, Guo, Weijun, Huang, Junling
Format:	Artikel
Sprache:	eng
Schlagworte:	3D generation Accuracy Avatars Cloning Consistency diffusion model Diffusion processes Gaussian processes Gaussian splatting Image color analysis Noise Point cloud compression Rendering (computer graphics) Sampling Surface treatment Three dimensional models Three-dimensional displays Virtual reality
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Creating stylized 3D avatars and portraits from just a single image input is an emerging challenge in augmented and virtual reality. While prior work has explored 2D stylization or 3D avatar generation, achieving high-fidelity 3D stylized portraits with text control remains an open problem. In this paper, we present an efficient approach for generating high-quality 3D stylized portraits directly from a single input image. Our core representations are based on 3D Gaussian Splatting for efficient rendering, along with a surface-guided splitting and cloning strategy to reduce noise. To achieve high-fidelity stylized results, we introduce a Stylized Generation Module with a Style-Aligned Sampling Loss that injects the input image's identity information into the diffusion model while stabilizing the stylization process. Furthermore, we incorporate a multi-view diffusion model to enforce 3D consistency by generating multiple viewpoints. Extensive experimentation demonstrates that our approach outperforms existing methods in terms of stylization quality, 3D consistency, and user preference ratings. Our framework enables casual users to easily generate stylized 3D portraits through simple image or text inputs, facilitating engaging experiences in AR/VR applications.
ISSN:	2169-3536 2169-3536
DOI:	10.1109/ACCESS.2024.3392568