An end-to-end framework for unconstrained monocular 3D hand pose estimation

•Our proposed framework can robustly infer 3D hand pose without requiring a prior.•Novel keypoint-based hand detector robust to confusing background and adjacent hands.•Two anatomy-based constraints for aiding 3D hand pose estimation network performance.•An end-to-end pipeline with state-of-the-art...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition 2021-07, Vol.115, p.107892, Article 107892
Hauptverfasser:	Sharma, Sanjeev, Huang, Shaoli
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science Computer Science, Artificial Intelligence Computer vision Deep learning Engineering Engineering, Electrical & Electronic Hand detection Hand pose estimation Hand tracking Science & Technology Technology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•Our proposed framework can robustly infer 3D hand pose without requiring a prior.•Novel keypoint-based hand detector robust to confusing background and adjacent hands.•Two anatomy-based constraints for aiding 3D hand pose estimation network performance.•An end-to-end pipeline with state-of-the-art performance on several datasets. This work addresses the challenging problem of unconstrained 3D hand pose estimation using monocular RGB images. Most of the existing approaches assume some prior knowledge of hand (such as hand locations and side information) is available for 3D hand pose estimation. This restricts their use in unconstrained environments. Therefore, we present an end-to-end framework that robustly predicts hand prior information and accurately infers 3D hand pose by learning ConvNet models while only using keypoint annotations. To enhance the hand detector’s robustness, we propose a novel keypoint-based method to simultaneously predict hand regions and side labels, unlike existing methods that suffer from background color confusion caused by using segmentation or detection-based technology. Moreover, inspired by the human hand’s biological structure, we introduce two geometric constraints directly into the 3D coordinates prediction that further improves its performance. Experimental results show that our proposed framework outperforms the state-of-art methods on standard benchmark datasets while providing robust predictions.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2021.107892