Lightweight Architecture for Real-Time Hand Pose Estimation with Deep Supervision
The high demand for computational resources severely hinders the deployment of deep learning applications in resource-limited devices. In this work, we investigate the under-studied but practically important network efficiency problem and present a new, lightweight architecture for hand pose estimat...
Gespeichert in:
Veröffentlicht in: | Symmetry (Basel) 2019-04, Vol.11 (4), p.585 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The high demand for computational resources severely hinders the deployment of deep learning applications in resource-limited devices. In this work, we investigate the under-studied but practically important network efficiency problem and present a new, lightweight architecture for hand pose estimation. Our architecture is essentially a deeply-supervised pruned network in which less important layers and branches are removed to achieve a higher real-time inference target on resource-constrained devices without much accuracy compromise. We further make deployment optimization to facilitate the parallel execution capability of central processing units (CPUs). We conduct experiments on NYU and ICVL datasets and develop a demo1 using the RealSense camera. Experimental results show our lightweight network achieves an average running time of 32 ms (31.3 FPS, the original is 22.7 FPS) before deployment optimization. Meanwhile, the model is only about half parameters size of the original one with 11.9 mm mean joint error. After the further optimization with OpenVINO, the optimized model can run at 56 FPS on CPUs in contrast to 44 FPS running on a graphics processing unit (GPU) (Tensorflow) and it can achieve the real-time goal. |
---|---|
ISSN: | 2073-8994 2073-8994 |
DOI: | 10.3390/sym11040585 |