MDA-YOLO Person: a 2D human pose estimation model based on YOLO detection framework
Human pose estimation aims to locate and predict the key points of the human body in images or videos. Due to the challenges of capturing complex spatial relationships and handling different body scales, accurate estimation of human pose remains challenging. Our work proposes a real-time human pose...
Gespeichert in:
Veröffentlicht in: | Cluster computing 2024-12, Vol.27 (9), p.12323-12340 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Human pose estimation aims to locate and predict the key points of the human body in images or videos. Due to the challenges of capturing complex spatial relationships and handling different body scales, accurate estimation of human pose remains challenging. Our work proposes a real-time human pose estimation method based on the anchor-assisted YOLOv7 framework, named
MDA-YOLO Person
. In this study, we propose the Keypoint Augmentation Strategies (KAS) to overcome the challenges faced in human pose estimation and improve the model’s ability to accurately predict keypoints. Furthermore, we introduce the Anchor Adjustment Module (AAM) as a replacement for the original YOLOv7’s detection head. By adjusting the parameters associated with the detector’s anchors, we achieve an increased recall rate and enhance the completeness of the pose estimation. Additionally, we incorporate the Multi-Scale Dual-Head Attention (MDA) module, which effectively models the weights of both channel and spatial dimensions at multiple scales, enabling the model to focus on more salient feature information. As a result, our approach outperforms other methods, as demonstrated by the promising results obtained on two large-scale public datasets. MDA-YOLO Person outperforms the baseline model YOLOv7-pose on both MS COCO 2017 and CrowdPose datasets, with improvements of 2.2% and 3.7% in precision and recall on MS COCO 2017, and 1.9% and 3.5% on CrowdPose, respectively. |
---|---|
ISSN: | 1386-7857 1573-7543 |
DOI: | 10.1007/s10586-024-04608-y |