CondiMen: Conditional Multi-Person Mesh Recovery
Multi-person human mesh recovery (HMR) consists in detecting all individuals in a given input image, and predicting the body shape, pose, and 3D location for each detected person. The dominant approaches to this task rely on neural networks trained to output a single prediction for each detected ind...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multi-person human mesh recovery (HMR) consists in detecting all individuals
in a given input image, and predicting the body shape, pose, and 3D location
for each detected person. The dominant approaches to this task rely on neural
networks trained to output a single prediction for each detected individual. In
contrast, we propose CondiMen, a method that outputs a joint parametric
distribution over likely poses, body shapes, intrinsics and distances to the
camera, using a Bayesian network. This approach offers several advantages.
First, a probability distribution can handle some inherent ambiguities of this
task -- such as the uncertainty between a person's size and their distance to
the camera, or simply the loss of information when projecting 3D data onto the
2D image plane. Second, the output distribution can be combined with additional
information to produce better predictions, by using e.g. known camera or body
shape parameters, or by exploiting multi-view observations. Third, one can
efficiently extract the most likely predictions from the output distribution,
making our proposed approach suitable for real-time applications. Empirically
we find that our model i) achieves performance on par with or better than the
state-of-the-art, ii) captures uncertainties and correlations inherent in pose
estimation and iii) can exploit additional information at test time, such as
multi-view consistency or body shape priors. CondiMen spices up the modeling of
ambiguity, using just the right ingredients on hand. |
---|---|
DOI: | 10.48550/arxiv.2412.13058 |