The DiffuseStyleGesture+ entry to the GENEA Challenge 2023
In this paper, we introduce the DiffuseStyleGesture+, our solution for the Generation and Evaluation of Non-verbal Behavior for Embodied Agents (GENEA) Challenge 2023, which aims to foster the development of realistic, automated systems for generating conversational gestures. Participants are provid...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we introduce the DiffuseStyleGesture+, our solution for the
Generation and Evaluation of Non-verbal Behavior for Embodied Agents (GENEA)
Challenge 2023, which aims to foster the development of realistic, automated
systems for generating conversational gestures. Participants are provided with
a pre-processed dataset and their systems are evaluated through crowdsourced
scoring. Our proposed model, DiffuseStyleGesture+, leverages a diffusion model
to generate gestures automatically. It incorporates a variety of modalities,
including audio, text, speaker ID, and seed gestures. These diverse modalities
are mapped to a hidden space and processed by a modified diffusion model to
produce the corresponding gesture for a given speech input. Upon evaluation,
the DiffuseStyleGesture+ demonstrated performance on par with the top-tier
models in the challenge, showing no significant differences with those models
in human-likeness, appropriateness for the interlocutor, and achieving
competitive performance with the best model on appropriateness for agent
speech. This indicates that our model is competitive and effective in
generating realistic and appropriate gestures for given speech. The code,
pre-trained models, and demos are available at
https://github.com/YoungSeng/DiffuseStyleGesture/tree/DiffuseStyleGesturePlus/BEAT-TWH-main. |
---|---|
DOI: | 10.48550/arxiv.2308.13879 |