Human pose estimation model based on DiracNets and integral pose regression

Human pose estimation has achieved great progress in recent years. However, many methods use max pooling, average pooling, or simple downsampling in the form of stepped convolution on the feature map to increase the feature receptive field of the network, which will lead to the loss of original feat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2023-09, Vol.82 (23), p.36019-36039
Hauptverfasser:	Xu, Xinzheng, Guo, Yanyan, Wang, Xin
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Communication Networks Computer networks Computer Science Data Structures and Information Theory Datasets Errors Feature extraction Feature maps Modules Multimedia Information Systems Pose estimation Special Purpose and Application-Based Systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Human pose estimation has achieved great progress in recent years. However, many methods use max pooling, average pooling, or simple downsampling in the form of stepped convolution on the feature map to increase the feature receptive field of the network, which will lead to the loss of original feature information and quantization errors. In order to solve above problems, we propose the RM-IPR-DDHPE model. Specifically, we firstly propose the DDHPE model which uses Mask R-CNN as the backbone network. In this model, we replace the residual module with an improved Dirac network module (DiracNets) to adaptively learn deeper features. Besides, we adopt the detail-preserving pooling (DPP) method which can amplify the spatial changes to solve the problem of key details loss in traditional pooling methods. On the basis of the above improvements, a RM-IPR-DDHPE model based on Ranger optimizer, Mish activation function and integral attitude regression is constructed, which can avoid quantization errors, optimize the gradient propagation and structure of the network. We validate the classification ability of DDHPE on the CIFAR dataset and the performance of the RM-IPR-DDHPE model for predicting human keypoints on the MSCOCO2014 dataset and the MPII dataset. The results of DDHPE on CIFAR-10 and CIFAR-100 are 95.27 and 77.51 respectively. The AP , AP 50 , AP 75 , AP M , AP L of RM-IPR-DDHPE on MSCOCO2014 are 78.0, 93.9, 85.4, 74.3, 84.9. And the average accuracy mAP of all key points on the MPII is 94.1. The results show that DDHPE has a good feature extraction ability, and the RM-IPR-DDHPE model improves the prediction accuracy while solving the quantization error of the DDHPE network joint point estimation.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-023-15057-x