Kinematics modeling network for video-based human pose estimation

Estimating human poses from videos is critical in human–computer interaction. Joints cooperate rather than move independently during human movement. There are both spatial and temporal correlations between joints. Despite the positive results of previous approaches, most of them focus on modeling th...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2024-06, Vol.150, p.110287, Article 110287
Hauptverfasser: Dang, Yonghao, Yin, Jianqin, Zhang, Shaojie, Liu, Jiping, Hu, Yanzhu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Estimating human poses from videos is critical in human–computer interaction. Joints cooperate rather than move independently during human movement. There are both spatial and temporal correlations between joints. Despite the positive results of previous approaches, most of them focus on modeling the spatial correlation between joints while only straightforwardly integrating features along the temporal dimension, which ignores the temporal correlation between joints. In this work, we propose a plug-and-play kinematics modeling module (KMM) to explicitly model temporal correlations between joints across different frames by calculating their temporal similarity. In this way, KMM can capture motion cues of the current joint relative to all joints in different time. Besides, we formulate video-based human pose estimation as a Markov Decision Process and design a novel kinematics modeling network (KIMNet) to simulate the Markov Chain, allowing KIMNet to locate joints recursively. Our approach achieves state-of-the-art results on two challenging benchmarks. In particular, KIMNet shows robustness to the occlusion. Code will be released at https://github.com/YHDang/KIMNet. •We propose a plug-and-play kinematics modeling module to explicitly model temporal correlations between joints across frames.•We formulate video-based human pose estimation as a Markov Decision Process and present a KIMNet to simulate the Markov Chain.•The proposed KIMNet locates the occluded joint by integrating joints’ information from other frames.•The proposed KIMNet achieves state-of-the-state results and shows superior performance on the occluded scenes.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2024.110287