UformPose: A U-shaped Hierarchical Multi-Scale Keypoint-Aware Framework for Human Pose Estimation

Human pose estimation is a fundamental yet challenging task in computer vision. However, difficult scenarios such as invisible keypoints, occlusions and small-scale persons are still not well-handed. In this paper, we present a novel pose estimation framework named UformPose which targets to relieve...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2023-04, Vol.33 (4), p.1-1
Hauptverfasser:	Wang, You-Jie, Luo, Yan-Min, Bai, Gui-Hu, Guo, Jing-Ming
Format:	Artikel
Sprache:	eng
Schlagworte:	attention operation Commonality Computer vision Feature extraction Heating systems Human Pose Estimation Keypoint-aware embedding Pose estimation Representations Shared Feature Pyramid Stem Task analysis Transformers Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Human pose estimation is a fundamental yet challenging task in computer vision. However, difficult scenarios such as invisible keypoints, occlusions and small-scale persons are still not well-handed. In this paper, we present a novel pose estimation framework named UformPose which targets to relieve these issues. UformPose has two core designs: Shared Feature Pyramid Stem (SFPS) and U-shaped hierarchical Multi-scale Keypoint-aware Attention Module (U-MKAM). SFPS is a feature pyramid stem with shared mechanism to learn stronger low-level features at the initial stage, and the shared mechanism can facilitate cross-resolution commonality learning. Our U-MKAM attempts to generate high-quality high-resolution representations by integrating all levels of feature representation of the backbone layer by layer. More importantly, we utilize the flexibility of attention operations for keypoint-aware modeling, which explicitly captures and trades-offs the dependencies between keypoints. We empirically demonstrate the effectiveness of our framework through the competitive pose estimation results on the COCO dataset. Extensive experiments and visual analysis on CrowdPose demonstrate the robustness of our model in crowd scenes.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2022.3213206