AvatarStudio: Text-driven Editing of 3D Dynamic Human Head Avatars
Capturing and editing full head performances enables the creation of virtual characters with various applications such as extended reality and media production. The past few years witnessed a steep rise in the photorealism of human head avatars. Such avatars can be controlled through different input...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Capturing and editing full head performances enables the creation of virtual
characters with various applications such as extended reality and media
production. The past few years witnessed a steep rise in the photorealism of
human head avatars. Such avatars can be controlled through different input data
modalities, including RGB, audio, depth, IMUs and others. While these data
modalities provide effective means of control, they mostly focus on editing the
head movements such as the facial expressions, head pose and/or camera
viewpoint. In this paper, we propose AvatarStudio, a text-based method for
editing the appearance of a dynamic full head avatar. Our approach builds on
existing work to capture dynamic performances of human heads using neural
radiance field (NeRF) and edits this representation with a text-to-image
diffusion model. Specifically, we introduce an optimization strategy for
incorporating multiple keyframes representing different camera viewpoints and
time stamps of a video performance into a single diffusion model. Using this
personalized diffusion model, we edit the dynamic NeRF by introducing
view-and-time-aware Score Distillation Sampling (VT-SDS) following a
model-based guidance approach. Our method edits the full head in a canonical
space, and then propagates these edits to remaining time steps via a pretrained
deformation network. We evaluate our method visually and numerically via a user
study, and results show that our method outperforms existing approaches. Our
experiments validate the design choices of our method and highlight that our
edits are genuine, personalized, as well as 3D- and time-consistent. |
---|---|
DOI: | 10.48550/arxiv.2306.00547 |