DivAvatar: Diverse 3D Avatar Generation with a Single Prompt
Text-to-Avatar generation has recently made significant strides due to advancements in diffusion models. However, most existing work remains constrained by limited diversity, producing avatars with subtle differences in appearance for a given text prompt. We design DivAvatar, a novel framework that...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Text-to-Avatar generation has recently made significant strides due to
advancements in diffusion models. However, most existing work remains
constrained by limited diversity, producing avatars with subtle differences in
appearance for a given text prompt. We design DivAvatar, a novel framework that
generates diverse avatars, empowering 3D creatives with a multitude of distinct
and richly varied 3D avatars from a single text prompt. Different from most
existing work that exploits scene-specific 3D representations such as NeRF,
DivAvatar finetunes a 3D generative model (i.e., EVA3D), allowing diverse
avatar generation from simply noise sampling in inference time. DivAvatar has
two key designs that help achieve generation diversity and visual quality. The
first is a noise sampling technique during training phase which is critical in
generating diverse appearances. The second is a semantic-aware zoom mechanism
and a novel depth loss, the former producing appearances of high textual
fidelity by separate fine-tuning of specific body parts and the latter
improving geometry quality greatly by smoothing the generated mesh in the
features space. Extensive experiments show that DivAvatar is highly versatile
in generating avatars of diverse appearances. |
---|---|
DOI: | 10.48550/arxiv.2402.17292 |