An embedded implementation of CNN-based hand detection and orientation estimation algorithm

Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significan...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Machine vision and applications 2019-09, Vol.30 (6), p.1071-1082
Hauptverfasser:	Yang, Li, Qi, Zhi, Liu, Zeheng, Liu, Hao, Ling, Ming, Shi, Longxing, Liu, Xinning
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Communications Engineering Computer Science Feature extraction Feature maps Image Processing and Computer Vision Model accuracy Networks Object recognition Orientation Original Paper Pattern Recognition Vision systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1082
container_issue	6
container_start_page	1071
container_title	Machine vision and applications
container_volume	30
creator	Yang, Li Qi, Zhi Liu, Zeheng Liu, Hao Ling, Ming Shi, Longxing Liu, Xinning
description	Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at https://github.com/yangli18/hand_detection ) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.
doi_str_mv	10.1007/s00138-019-01038-4
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2277254921</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2277254921</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-d8ddee7a13900d4b6bd6d45dc139d058bacaed285de45b7abfad0de9ab6437863</originalsourceid><addsrcrecordid>eNp9ULFOwzAQtRBIlMIPMEViDpwdJ47HqgKKVJUFJgbLzl3aVE1S7HTg73EbBBvD6Z793rs7PcZuOdxzAPUQAHhWpsB1LIhInrEJl5lIuSr0OZuAjrgELS7ZVQhbAJBKyQn7mHUJtY4QCZOm3e-opW6wQ9N3SV8n89UqdTZEbmM7TJAGqk7c8dX75ldLYWjaEdrdOjLDpr1mF7XdBbr56VP2_vT4Nl-ky9fnl_lsmVYZ10OKZVxOyvJMA6B0hcMCZY5V_EDIS2crSyjKHEnmTllXWwQkbV0hM1UW2ZTdjXP3vv88xEvMtj_4Lq40QiglcqkFjyoxqirfh-CpNnsfT_ZfhoM5hmjGEE0M0ZxCNDKastEUorhbk_8b_Y_rG--QdoE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2277254921</pqid></control><display><type>article</type><title>An embedded implementation of CNN-based hand detection and orientation estimation algorithm</title><source>Springer Nature - Complete Springer Journals</source><creator>Yang, Li ; Qi, Zhi ; Liu, Zeheng ; Liu, Hao ; Ling, Ming ; Shi, Longxing ; Liu, Xinning</creator><creatorcontrib>Yang, Li ; Qi, Zhi ; Liu, Zeheng ; Liu, Hao ; Ling, Ming ; Shi, Longxing ; Liu, Xinning</creatorcontrib><description>Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at https://github.com/yangli18/hand_detection ) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.</description><identifier>ISSN: 0932-8092</identifier><identifier>EISSN: 1432-1769</identifier><identifier>DOI: 10.1007/s00138-019-01038-4</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Communications Engineering ; Computer Science ; Feature extraction ; Feature maps ; Image Processing and Computer Vision ; Model accuracy ; Networks ; Object recognition ; Orientation ; Original Paper ; Pattern Recognition ; Vision systems</subject><ispartof>Machine vision and applications, 2019-09, Vol.30 (6), p.1071-1082</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2019</rights><rights>Machine Vision and Applications is a copyright of Springer, (2019). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-d8ddee7a13900d4b6bd6d45dc139d058bacaed285de45b7abfad0de9ab6437863</citedby><cites>FETCH-LOGICAL-c319t-d8ddee7a13900d4b6bd6d45dc139d058bacaed285de45b7abfad0de9ab6437863</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00138-019-01038-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00138-019-01038-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Yang, Li</creatorcontrib><creatorcontrib>Qi, Zhi</creatorcontrib><creatorcontrib>Liu, Zeheng</creatorcontrib><creatorcontrib>Liu, Hao</creatorcontrib><creatorcontrib>Ling, Ming</creatorcontrib><creatorcontrib>Shi, Longxing</creatorcontrib><creatorcontrib>Liu, Xinning</creatorcontrib><title>An embedded implementation of CNN-based hand detection and orientation estimation algorithm</title><title>Machine vision and applications</title><addtitle>Machine Vision and Applications</addtitle><description>Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at https://github.com/yangli18/hand_detection ) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.</description><subject>Algorithms</subject><subject>Communications Engineering</subject><subject>Computer Science</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Image Processing and Computer Vision</subject><subject>Model accuracy</subject><subject>Networks</subject><subject>Object recognition</subject><subject>Orientation</subject><subject>Original Paper</subject><subject>Pattern Recognition</subject><subject>Vision systems</subject><issn>0932-8092</issn><issn>1432-1769</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9ULFOwzAQtRBIlMIPMEViDpwdJ47HqgKKVJUFJgbLzl3aVE1S7HTg73EbBBvD6Z793rs7PcZuOdxzAPUQAHhWpsB1LIhInrEJl5lIuSr0OZuAjrgELS7ZVQhbAJBKyQn7mHUJtY4QCZOm3e-opW6wQ9N3SV8n89UqdTZEbmM7TJAGqk7c8dX75ldLYWjaEdrdOjLDpr1mF7XdBbr56VP2_vT4Nl-ky9fnl_lsmVYZ10OKZVxOyvJMA6B0hcMCZY5V_EDIS2crSyjKHEnmTllXWwQkbV0hM1UW2ZTdjXP3vv88xEvMtj_4Lq40QiglcqkFjyoxqirfh-CpNnsfT_ZfhoM5hmjGEE0M0ZxCNDKastEUorhbk_8b_Y_rG--QdoE</recordid><startdate>20190901</startdate><enddate>20190901</enddate><creator>Yang, Li</creator><creator>Qi, Zhi</creator><creator>Liu, Zeheng</creator><creator>Liu, Hao</creator><creator>Ling, Ming</creator><creator>Shi, Longxing</creator><creator>Liu, Xinning</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20190901</creationdate><title>An embedded implementation of CNN-based hand detection and orientation estimation algorithm</title><author>Yang, Li ; Qi, Zhi ; Liu, Zeheng ; Liu, Hao ; Ling, Ming ; Shi, Longxing ; Liu, Xinning</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-d8ddee7a13900d4b6bd6d45dc139d058bacaed285de45b7abfad0de9ab6437863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Communications Engineering</topic><topic>Computer Science</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Image Processing and Computer Vision</topic><topic>Model accuracy</topic><topic>Networks</topic><topic>Object recognition</topic><topic>Orientation</topic><topic>Original Paper</topic><topic>Pattern Recognition</topic><topic>Vision systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Li</creatorcontrib><creatorcontrib>Qi, Zhi</creatorcontrib><creatorcontrib>Liu, Zeheng</creatorcontrib><creatorcontrib>Liu, Hao</creatorcontrib><creatorcontrib>Ling, Ming</creatorcontrib><creatorcontrib>Shi, Longxing</creatorcontrib><creatorcontrib>Liu, Xinning</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>Machine vision and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Li</au><au>Qi, Zhi</au><au>Liu, Zeheng</au><au>Liu, Hao</au><au>Ling, Ming</au><au>Shi, Longxing</au><au>Liu, Xinning</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An embedded implementation of CNN-based hand detection and orientation estimation algorithm</atitle><jtitle>Machine vision and applications</jtitle><stitle>Machine Vision and Applications</stitle><date>2019-09-01</date><risdate>2019</risdate><volume>30</volume><issue>6</issue><spage>1071</spage><epage>1082</epage><pages>1071-1082</pages><issn>0932-8092</issn><eissn>1432-1769</eissn><abstract>Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at https://github.com/yangli18/hand_detection ) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00138-019-01038-4</doi><tpages>12</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0932-8092
ispartof	Machine vision and applications, 2019-09, Vol.30 (6), p.1071-1082
issn	0932-8092 1432-1769
language	eng
recordid	cdi_proquest_journals_2277254921
source	Springer Nature - Complete Springer Journals
subjects	Algorithms Communications Engineering Computer Science Feature extraction Feature maps Image Processing and Computer Vision Model accuracy Networks Object recognition Orientation Original Paper Pattern Recognition Vision systems
title	An embedded implementation of CNN-based hand detection and orientation estimation algorithm
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T13%3A32%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20embedded%20implementation%20of%20CNN-based%20hand%20detection%20and%20orientation%20estimation%20algorithm&rft.jtitle=Machine%20vision%20and%20applications&rft.au=Yang,%20Li&rft.date=2019-09-01&rft.volume=30&rft.issue=6&rft.spage=1071&rft.epage=1082&rft.pages=1071-1082&rft.issn=0932-8092&rft.eissn=1432-1769&rft_id=info:doi/10.1007/s00138-019-01038-4&rft_dat=%3Cproquest_cross%3E2277254921%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2277254921&rft_id=info:pmid/&rfr_iscdi=true