Multi-HMR: Multi-Person Whole-Body Human Mesh Recovery in a Single Shot
We present Multi-HMR, a strong sigle-shot model for multi-person 3D human mesh recovery from a single RGB image. Predictions encompass the whole body, i.e., including hands and facial expressions, using the SMPL-X parametric model and 3D location in the camera coordinate system. Our model detects pe...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present Multi-HMR, a strong sigle-shot model for multi-person 3D human
mesh recovery from a single RGB image. Predictions encompass the whole body,
i.e., including hands and facial expressions, using the SMPL-X parametric model
and 3D location in the camera coordinate system. Our model detects people by
predicting coarse 2D heatmaps of person locations, using features produced by a
standard Vision Transformer (ViT) backbone. It then predicts their whole-body
pose, shape and 3D location using a new cross-attention module called the Human
Prediction Head (HPH), with one query attending to the entire set of features
for each detected person. As direct prediction of fine-grained hands and facial
poses in a single shot, i.e., without relying on explicit crops around body
parts, is hard to learn from existing data, we introduce CUFFS, the Close-Up
Frames of Full-Body Subjects dataset, containing humans close to the camera
with diverse hand poses. We show that incorporating it into the training data
further enhances predictions, particularly for hands. Multi-HMR also optionally
accounts for camera intrinsics, if available, by encoding camera ray directions
for each image token. This simple design achieves strong performance on
whole-body and body-only benchmarks simultaneously: a ViT-S backbone on
$448{\times}448$ images already yields a fast and competitive model, while
larger models and higher resolutions obtain state-of-the-art results. |
---|---|
DOI: | 10.48550/arxiv.2402.14654 |