HIMO: A New Benchmark for Full-Body Human Interacting with Multiple Objects
Generating human-object interactions (HOIs) is critical with the tremendous advances of digital avatars. Existing datasets are typically limited to humans interacting with a single object while neglecting the ubiquitous manipulation of multiple objects. Thus, we propose HIMO, a large-scale MoCap dat...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Generating human-object interactions (HOIs) is critical with the tremendous
advances of digital avatars. Existing datasets are typically limited to humans
interacting with a single object while neglecting the ubiquitous manipulation
of multiple objects. Thus, we propose HIMO, a large-scale MoCap dataset of
full-body human interacting with multiple objects, containing 3.3K 4D HOI
sequences and 4.08M 3D HOI frames. We also annotate HIMO with detailed textual
descriptions and temporal segments, benchmarking two novel tasks of HOI
synthesis conditioned on either the whole text prompt or the segmented text
prompts as fine-grained timeline control. To address these novel tasks, we
propose a dual-branch conditional diffusion model with a mutual interaction
module for HOI synthesis. Besides, an auto-regressive generation pipeline is
also designed to obtain smooth transitions between HOI segments. Experimental
results demonstrate the generalization ability to unseen object geometries and
temporal compositions. |
---|---|
DOI: | 10.48550/arxiv.2407.12371 |