Learning high resolution reservation for human pose estimation

The human pose estimation in images and videos is a challenging task in many applications. Most of the network structures used to estimate the pose only use the convolution feature of the last layer, which will cause the loss of information. In this paper, we propose a multi-scales fusion framework...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2021-08, Vol.80 (19), p.29251-29265
Hauptverfasser: Gao, Bingkun, Ma, Ke, Bi, Hongbo, Wang, Ling, Wu, Chenlei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The human pose estimation in images and videos is a challenging task in many applications. Most of the network structures used to estimate the pose only use the convolution feature of the last layer, which will cause the loss of information. In this paper, we propose a multi-scales fusion framework based on the hourglass network for the human pose estimation, which can effectively obtain sufficient information of different resolutions. In the process of extracting different resolution features, the network constantly complements the high resolution features. Additionally, we design the depth pyramid residual module to fuse different various scales features. The whole network is stacked by sub-networks. For applying in limited storage space better, we only use 2-stage stacked network. We test the network on standard benchmarks MPII dataset, our method achieves 88.9% PCKh score and improves the PCK score by 0.7%, compared with the original network. Our approach gains state-of-the-art results.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-021-10731-4