Learning high resolution reservation for human pose estimation
The human pose estimation in images and videos is a challenging task in many applications. Most of the network structures used to estimate the pose only use the convolution feature of the last layer, which will cause the loss of information. In this paper, we propose a multi-scales fusion framework...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2021-08, Vol.80 (19), p.29251-29265 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The human pose estimation in images and videos is a challenging task in many applications. Most of the network structures used to estimate the pose only use the convolution feature of the last layer, which will cause the loss of information. In this paper, we propose a multi-scales fusion framework based on the hourglass network for the human pose estimation, which can effectively obtain sufficient information of different resolutions. In the process of extracting different resolution features, the network constantly complements the high resolution features. Additionally, we design the depth pyramid residual module to fuse different various scales features. The whole network is stacked by sub-networks. For applying in limited storage space better, we only use 2-stage stacked network. We test the network on standard benchmarks MPII dataset, our method achieves 88.9% PCKh score and improves the PCK score by 0.7%, compared with the original network. Our approach gains state-of-the-art results. |
---|---|
ISSN: | 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-021-10731-4 |