Weakly-Supervised Emotion Transition Learning for Diverse 3D Co-speech Gesture Generation
Generating vivid and emotional 3D co-speech gestures is crucial for virtual avatar animation in human-machine interaction applications. While the existing methods enable generating the gestures to follow a single emotion label, they overlook that long gesture sequence modeling with emotion transitio...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Generating vivid and emotional 3D co-speech gestures is crucial for virtual
avatar animation in human-machine interaction applications. While the existing
methods enable generating the gestures to follow a single emotion label, they
overlook that long gesture sequence modeling with emotion transition is more
practical in real scenes. In addition, the lack of large-scale available
datasets with emotional transition speech and corresponding 3D human gestures
also limits the addressing of this task. To fulfill this goal, we first
incorporate the ChatGPT-4 and an audio inpainting approach to construct the
high-fidelity emotion transition human speeches. Considering obtaining the
realistic 3D pose annotations corresponding to the dynamically inpainted
emotion transition audio is extremely difficult, we propose a novel weakly
supervised training strategy to encourage authority gesture transitions.
Specifically, to enhance the coordination of transition gestures w.r.t
different emotional ones, we model the temporal association representation
between two different emotional gesture sequences as style guidance and infuse
it into the transition generation. We further devise an emotion mixture
mechanism that provides weak supervision based on a learnable mixed emotion
label for transition gestures. Last, we present a keyframe sampler to supply
effective initial posture cues in long sequences, enabling us to generate
diverse gestures. Extensive experiments demonstrate that our method outperforms
the state-of-the-art models constructed by adapting single emotion-conditioned
counterparts on our newly defined emotion transition task and datasets. Our
code and dataset will be released on the project page:
https://xingqunqi-lab.github.io/Emo-Transition-Gesture/. |
---|---|
DOI: | 10.48550/arxiv.2311.17532 |