Center point to pose: Multiple views 3D human pose estimation for multi-person

3D human pose estimation has always been an important task in computer vision, especially in crowded scenes where multiple people interact with each other. There are many state-of-the-arts for object detection based on single view. However, recovering the location of people is complicated in crowded...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2022-09, Vol.17 (9), p.e0274450-e0274450
Hauptverfasser:	Liu, Huan, Wu, Jian, He, Rui
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Analysis Biology and Life Sciences Cameras Computer and Information Sciences Computer vision Engineering and Technology Evaluation Humans Imaging, Three-Dimensional - methods Machine vision Medicine and Health Sciences Neural networks Object recognition Pose estimation Research and Analysis Methods Social Sciences
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	3D human pose estimation has always been an important task in computer vision, especially in crowded scenes where multiple people interact with each other. There are many state-of-the-arts for object detection based on single view. However, recovering the location of people is complicated in crowded and occluded scenes due to the lack of depth information for single view, which is the lack of robustness. Multi-view Human Pose Estimation for Multi-Person became an effective approach. The previous multi-view 3D human pose estimation method can be attributed to a strategy to associate the joints of the same person from 2D pose estimation. However, the incompleteness and noise of the 2D pose are inevitable. In addition, how to associate the joints itself is challenging. To solve this issue, we propose a CTP (Center Point to Pose) network based on multi-view which directly operates in the 3D space. The 2D joint features in all cameras are projected into 3D voxel space. Our CTP network regresses the center of one person as the location, and the 3D bounding box as the activity area of one person. Then our CTP network estimates detailed 3D pose for each bounding box. Besides, our CTP network is Non-Maximum Suppression free at the stage of regressing the center of one person, which makes it more efficient and simpler. Our method outperforms competitively on several public datasets which shows the efficacy of our center point to pose network representation.
ISSN:	1932-6203 1932-6203
DOI:	10.1371/journal.pone.0274450