BiPO: Bidirectional Partial Occlusion Network for Text-to-Motion Synthesis
Generating natural and expressive human motions from textual descriptions is challenging due to the complexity of coordinating full-body dynamics and capturing nuanced motion patterns over extended sequences that accurately reflect the given text. To address this, we introduce BiPO, Bidirectional Pa...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Generating natural and expressive human motions from textual descriptions is
challenging due to the complexity of coordinating full-body dynamics and
capturing nuanced motion patterns over extended sequences that accurately
reflect the given text. To address this, we introduce BiPO, Bidirectional
Partial Occlusion Network for Text-to-Motion Synthesis, a novel model that
enhances text-to-motion synthesis by integrating part-based generation with a
bidirectional autoregressive architecture. This integration allows BiPO to
consider both past and future contexts during generation while enhancing
detailed control over individual body parts without requiring ground-truth
motion length. To relax the interdependency among body parts caused by the
integration, we devise the Partial Occlusion technique, which probabilistically
occludes the certain motion part information during training. In our
comprehensive experiments, BiPO achieves state-of-the-art performance on the
HumanML3D dataset, outperforming recent methods such as ParCo, MoMask, and BAMM
in terms of FID scores and overall motion quality. Notably, BiPO excels not
only in the text-to-motion generation task but also in motion editing tasks
that synthesize motion based on partially generated motion sequences and
textual descriptions. These results reveal the BiPO's effectiveness in
advancing text-to-motion synthesis and its potential for practical
applications. |
---|---|
DOI: | 10.48550/arxiv.2412.00112 |