Learning Uniformly Distributed Embedding Clusters of Stylistic Skills for Physically Simulated Characters

Learning natural and diverse behaviors from human motion datasets remains challenging in physics-based character control. Existing conditional adversarial models often suffer from tight and biased embedding distributions where embeddings from the same motion are closely grouped in a small area and s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Liu, Nian, Liu, Libin, Zhang, Zilong, Wang, Zi, Xie, Hongzhao, Liu, Tengyu, Tong, Xinyi, Yang, Yaodong, He, Zhaofeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Learning natural and diverse behaviors from human motion datasets remains challenging in physics-based character control. Existing conditional adversarial models often suffer from tight and biased embedding distributions where embeddings from the same motion are closely grouped in a small area and shorter motions occupy even less space. Our empirical observations indicate this limits the representational capacity and diversity under each skill. An ideal latent space should be maximally packed by all motion's embedding clusters. In this paper, we propose a skill-conditioned controller that learns diverse skills with expressive variations. Our approach leverages the Neural Collapse phenomenon, a natural outcome of the classification-based encoder, to uniformly distributed cluster centers. We additionally propose a novel Embedding Expansion technique to form stylistic embedding clusters for diverse skills that are uniformly distributed on a hypersphere, maximizing the representational area occupied by each skill and minimizing unmapped regions. This maximally packed and uniformly distributed embedding space ensures that embeddings within the same cluster generate behaviors conforming to the characteristics of the corresponding motion clips, yet exhibiting noticeable variations within each cluster. Compared to existing methods, our controller not only generates high-quality, diverse motions covering the entire dataset but also achieves superior controllability, motion coverage, and diversity under each skill. Both qualitative and quantitative results confirm these traits, enabling our controller to be applied to a wide range of downstream tasks and serving as a cornerstone for diverse applications.
DOI:10.48550/arxiv.2411.06459