A multi-branch hand pose estimation network with joint-wise feature extraction and fusion

The study of 3D hand pose estimation from a single depth image is regarded as a detection-based or regression-based problem among most of the existing deep learning-based methods, and this approach does not fully exploit the geometry of the hand, such as its structural and physical constraints. To o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal processing. Image communication 2020-02, Vol.81, p.115692, Article 115692
Hauptverfasser: Li, Xuefeng, Zhou, Yidan, Sun, Yi, Lin, Xiangbo, Ma, Xiaohong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The study of 3D hand pose estimation from a single depth image is regarded as a detection-based or regression-based problem among most of the existing deep learning-based methods, and this approach does not fully exploit the geometry of the hand, such as its structural and physical constraints. To overcome these weaknesses, we design a network with three simple parallel branches that correspond to the three functional parts of the hand. This observation is motivated by the biological viewpoint that each finger plays a different role in performing grasping and manipulation. In each branch, we perform a more detailed regression in two stages – top-down joint location regression followed by bottom-up hand pose regression – which fully exploits both the local and global structure of a hand. Finally, we further make use of the hand structure and physical constraints to refine each joint by its auxiliary points. The proposed network is a unified structure and function model that is more appropriate for hand pose estimation. Our system does not require pose pre-processing or feedback since it can directly perform training and predicting from end-to-end. The experimental results on three public datasets demonstrate that the proposed system achieves performance comparable to state-of-the-art methods. •Refined feature extraction achieves high accuracy for pose estimation.•Intermediate supervision in multi-stage feature fusion helps network training.•Auxiliary points are used as the priori knowledge of hand structure.
ISSN:0923-5965
1879-2677
DOI:10.1016/j.image.2019.115692