Kinematics-aware spatial-temporal feature transform for 3D human pose estimation
3D human pose estimation plays an important role in various human-machine interactive applications, but how to effectively extract and represent the kinematical features of human body structure in video has always been a challenge. This paper presents some inspiring observations on the human body pr...
Gespeichert in:
Veröffentlicht in: | Pattern recognition 2024-06, Vol.150, p.110316, Article 110316 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | 3D human pose estimation plays an important role in various human-machine interactive applications, but how to effectively extract and represent the kinematical features of human body structure in video has always been a challenge. This paper presents some inspiring observations on the human body properties that hold heuristic patterns of human poses: 1) There is distinct temporal coherence in any kind of human pose; 2) there exist evident spatial and temporal correlations among local joints even though the human is doing complex actions. According to the observed patterns, a locally structured feature encoder and a spatial–temporal feature transform are proposed for kinematics-aware feature extraction and enhancement. Unlike existing works directly projecting every bone joint to pose features without distinction, the proposed locally-structured feature encoder maps the local connection property of human body structure to kinematical features which are neural embeddings extracted from both local and global groups of human bone joints. Since the local and global bone-joint groups are pre-defined according to human body kinematics, the kinematical features are able to represent body kinematics. The kinematical features are then transformed by the proposed spatial–temporal feature transform to enhance the spatial and temporal correlations among human bone joints. The overall framework well promotes the representation of human body kinematics for 3D pose estimation. Extensive experimental results on commonly used datasets show that the mean per joint position error (MPJPE) is significantly reduced when compared with state-of-the-art methods under the same experimental condition. The improvement is expected to promote machines to better understand human poses for building superior human-centered automation systems.
•Spatial–temporal kinematic-awareness is studied for 3D human pose estimation.•Hybrid-kinematical feature encoder extracts kinematical features of 2D pose.•Spatial–temporal feature transform enhances the spatial and temporal correlations.•The fusion of spatial and temporal features promote the final 3D pose estimation. |
---|---|
ISSN: | 0031-3203 1873-5142 |
DOI: | 10.1016/j.patcog.2024.110316 |