EDTalk: Efficient Disentanglement for Emotional Talking Head Synthesis
Achieving disentangled control over multiple facial motions and accommodating diverse input modalities greatly enhances the application and entertainment of the talking head generation. This necessitates a deep exploration of the decoupling space for facial features, ensuring that they a) operate in...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Achieving disentangled control over multiple facial motions and accommodating
diverse input modalities greatly enhances the application and entertainment of
the talking head generation. This necessitates a deep exploration of the
decoupling space for facial features, ensuring that they a) operate
independently without mutual interference and b) can be preserved to share with
different modal input, both aspects often neglected in existing methods. To
address this gap, this paper proposes a novel Efficient Disentanglement
framework for Talking head generation (EDTalk). Our framework enables
individual manipulation of mouth shape, head pose, and emotional expression,
conditioned on video or audio inputs. Specifically, we employ three lightweight
modules to decompose the facial dynamics into three distinct latent spaces
representing mouth, pose, and expression, respectively. Each space is
characterized by a set of learnable bases whose linear combinations define
specific motions. To ensure independence and accelerate training, we enforce
orthogonality among bases and devise an efficient training strategy to allocate
motion responsibilities to each space without relying on external knowledge.
The learned bases are then stored in corresponding banks, enabling shared
visual priors with audio input. Furthermore, considering the properties of each
space, we propose an Audio-to-Motion module for audio-driven talking head
synthesis. Experiments are conducted to demonstrate the effectiveness of
EDTalk. We recommend watching the project website:
https://tanshuai0219.github.io/EDTalk/ |
---|---|
DOI: | 10.48550/arxiv.2404.01647 |