LCR-Net++: Multi-Person 2D and 3D Pose Detection in Natural Images

We propose an end-to-end architecture for joint 2D and 3D human pose estimation in natural images. Key to our approach is the generation and scoring of a number of pose proposals per image, which allows us to predict 2D and 3D poses of multiple people simultaneously. Hence, our approach does not req...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2020-05, Vol.42 (5), p.1146-1161
Hauptverfasser:	Rogez, Gregory, Weinzaepfel, Philippe, Schmid, Cordelia
Format:	Artikel
Sprache:	eng
Schlagworte:	2D pose estimation Algorithms Architecture Body parts classification CNN Computer Science Computer Vision and Pattern Recognition Databases, Factual detection Heating systems Human 3D pose estimation Humans Image detection Imaging, Three-Dimensional - methods Joints Localization Neural Networks, Computer Pose estimation Posture - physiology Proposals regression Three dimensional bodies Three-dimensional displays Training data Two dimensional bodies Two dimensional displays
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We propose an end-to-end architecture for joint 2D and 3D human pose estimation in natural images. Key to our approach is the generation and scoring of a number of pose proposals per image, which allows us to predict 2D and 3D poses of multiple people simultaneously. Hence, our approach does not require an approximate localization of the humans for initialization. Our Localization-Classification-Regression architecture, named LCR-Net, contains 3 main components: 1) the pose proposal generator that suggests candidate poses at different locations in the image; 2) a classifier that scores the different pose proposals; and 3) a regressor that refines pose proposals both in 2D and 3D. All three stages share the convolutional feature layers and are trained jointly. The final pose estimation is obtained by integrating over neighboring pose hypotheses, which is shown to improve over a standard non maximum suppression algorithm. Our method recovers full-body 2D and 3D poses, hallucinating plausible body parts when the persons are partially occluded or truncated by the image boundary. Our approach significantly outperforms the state of the art in 3D pose estimation on Human3.6M, a controlled environment. Moreover, it shows promising results on real images for both single and multi-person subsets of the MPII 2D pose benchmark and demonstrates satisfying 3D pose results even for multi-person images.
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2019.2892985