An embedded implementation of CNN-based hand detection and orientation estimation algorithm

Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significan...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Machine vision and applications 2019-09, Vol.30 (6), p.1071-1082
Hauptverfasser:	Yang, Li, Qi, Zhi, Liu, Zeheng, Liu, Hao, Ling, Ming, Shi, Longxing, Liu, Xinning
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Communications Engineering Computer Science Feature extraction Feature maps Image Processing and Computer Vision Model accuracy Networks Object recognition Orientation Original Paper Pattern Recognition Vision systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at https://github.com/yangli18/hand_detection ) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.
ISSN:	0932-8092 1432-1769
DOI:	10.1007/s00138-019-01038-4