A multi-branch hand pose estimation network with joint-wise feature extraction and fusion
The study of 3D hand pose estimation from a single depth image is regarded as a detection-based or regression-based problem among most of the existing deep learning-based methods, and this approach does not fully exploit the geometry of the hand, such as its structural and physical constraints. To o...
Gespeichert in:
Veröffentlicht in: | Signal processing. Image communication 2020-02, Vol.81, p.115692, Article 115692 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The study of 3D hand pose estimation from a single depth image is regarded as a detection-based or regression-based problem among most of the existing deep learning-based methods, and this approach does not fully exploit the geometry of the hand, such as its structural and physical constraints. To overcome these weaknesses, we design a network with three simple parallel branches that correspond to the three functional parts of the hand. This observation is motivated by the biological viewpoint that each finger plays a different role in performing grasping and manipulation. In each branch, we perform a more detailed regression in two stages – top-down joint location regression followed by bottom-up hand pose regression – which fully exploits both the local and global structure of a hand. Finally, we further make use of the hand structure and physical constraints to refine each joint by its auxiliary points. The proposed network is a unified structure and function model that is more appropriate for hand pose estimation. Our system does not require pose pre-processing or feedback since it can directly perform training and predicting from end-to-end. The experimental results on three public datasets demonstrate that the proposed system achieves performance comparable to state-of-the-art methods.
•Refined feature extraction achieves high accuracy for pose estimation.•Intermediate supervision in multi-stage feature fusion helps network training.•Auxiliary points are used as the priori knowledge of hand structure. |
---|---|
ISSN: | 0923-5965 1879-2677 |
DOI: | 10.1016/j.image.2019.115692 |