MultiPly: Reconstruction of Multiple People from Monocular Video in the Wild
We present MultiPly, a novel framework to reconstruct multiple people in 3D from monocular in-the-wild videos. Reconstructing multiple individuals moving and interacting naturally from monocular in-the-wild videos poses a challenging task. Addressing it necessitates precise pixel-level disentangleme...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present MultiPly, a novel framework to reconstruct multiple people in 3D
from monocular in-the-wild videos. Reconstructing multiple individuals moving
and interacting naturally from monocular in-the-wild videos poses a challenging
task. Addressing it necessitates precise pixel-level disentanglement of
individuals without any prior knowledge about the subjects. Moreover, it
requires recovering intricate and complete 3D human shapes from short video
sequences, intensifying the level of difficulty. To tackle these challenges, we
first define a layered neural representation for the entire scene, composited
by individual human and background models. We learn the layered neural
representation from videos via our layer-wise differentiable volume rendering.
This learning process is further enhanced by our hybrid instance segmentation
approach which combines the self-supervised 3D segmentation and the promptable
2D segmentation module, yielding reliable instance segmentation supervision
even under close human interaction. A confidence-guided optimization
formulation is introduced to optimize the human poses and shape/appearance
alternately. We incorporate effective objectives to refine human poses via
photometric information and impose physically plausible constraints on human
dynamics, leading to temporally consistent 3D reconstructions with high
fidelity. The evaluation of our method shows the superiority over prior art on
publicly available datasets and in-the-wild videos. |
---|---|
DOI: | 10.48550/arxiv.2406.01595 |