MimicTalk: Mimicking a personalized and expressive 3D talking face in minutes
Talking face generation (TFG) aims to animate a target identity's face to create realistic talking videos. Personalized TFG is a variant that emphasizes the perceptual identity similarity of the synthesized result (from the perspective of appearance and talking style). While previous works typi...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Talking face generation (TFG) aims to animate a target identity's face to
create realistic talking videos. Personalized TFG is a variant that emphasizes
the perceptual identity similarity of the synthesized result (from the
perspective of appearance and talking style). While previous works typically
solve this problem by learning an individual neural radiance field (NeRF) for
each identity to implicitly store its static and dynamic information, we find
it inefficient and non-generalized due to the per-identity-per-training
framework and the limited training data. To this end, we propose MimicTalk, the
first attempt that exploits the rich knowledge from a NeRF-based
person-agnostic generic model for improving the efficiency and robustness of
personalized TFG. To be specific, (1) we first come up with a person-agnostic
3D TFG model as the base model and propose to adapt it into a specific
identity; (2) we propose a static-dynamic-hybrid adaptation pipeline to help
the model learn the personalized static appearance and facial dynamic features;
(3) To generate the facial motion of the personalized talking style, we propose
an in-context stylized audio-to-motion model that mimics the implicit talking
style provided in the reference video without information loss by an explicit
style representation. The adaptation process to an unseen identity can be
performed in 15 minutes, which is 47 times faster than previous
person-dependent methods. Experiments show that our MimicTalk surpasses
previous baselines regarding video quality, efficiency, and expressiveness.
Source code and video samples are available at https://mimictalk.github.io . |
---|---|
DOI: | 10.48550/arxiv.2410.06734 |