An embedded implementation of CNN-based hand detection and orientation estimation algorithm

Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significan...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Machine vision and applications 2019-09, Vol.30 (6), p.1071-1082
Hauptverfasser: Yang, Li, Qi, Zhi, Liu, Zeheng, Liu, Hao, Ling, Ming, Shi, Longxing, Liu, Xinning
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1082
container_issue 6
container_start_page 1071
container_title Machine vision and applications
container_volume 30
creator Yang, Li
Qi, Zhi
Liu, Zeheng
Liu, Hao
Ling, Ming
Shi, Longxing
Liu, Xinning
description Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at https://github.com/yangli18/hand_detection ) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.
doi_str_mv 10.1007/s00138-019-01038-4
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2277254921</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2277254921</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-d8ddee7a13900d4b6bd6d45dc139d058bacaed285de45b7abfad0de9ab6437863</originalsourceid><addsrcrecordid>eNp9ULFOwzAQtRBIlMIPMEViDpwdJ47HqgKKVJUFJgbLzl3aVE1S7HTg73EbBBvD6Z793rs7PcZuOdxzAPUQAHhWpsB1LIhInrEJl5lIuSr0OZuAjrgELS7ZVQhbAJBKyQn7mHUJtY4QCZOm3e-opW6wQ9N3SV8n89UqdTZEbmM7TJAGqk7c8dX75ldLYWjaEdrdOjLDpr1mF7XdBbr56VP2_vT4Nl-ky9fnl_lsmVYZ10OKZVxOyvJMA6B0hcMCZY5V_EDIS2crSyjKHEnmTllXWwQkbV0hM1UW2ZTdjXP3vv88xEvMtj_4Lq40QiglcqkFjyoxqirfh-CpNnsfT_ZfhoM5hmjGEE0M0ZxCNDKastEUorhbk_8b_Y_rG--QdoE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2277254921</pqid></control><display><type>article</type><title>An embedded implementation of CNN-based hand detection and orientation estimation algorithm</title><source>Springer Nature - Complete Springer Journals</source><creator>Yang, Li ; Qi, Zhi ; Liu, Zeheng ; Liu, Hao ; Ling, Ming ; Shi, Longxing ; Liu, Xinning</creator><creatorcontrib>Yang, Li ; Qi, Zhi ; Liu, Zeheng ; Liu, Hao ; Ling, Ming ; Shi, Longxing ; Liu, Xinning</creatorcontrib><description>Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at https://github.com/yangli18/hand_detection ) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.</description><identifier>ISSN: 0932-8092</identifier><identifier>EISSN: 1432-1769</identifier><identifier>DOI: 10.1007/s00138-019-01038-4</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Communications Engineering ; Computer Science ; Feature extraction ; Feature maps ; Image Processing and Computer Vision ; Model accuracy ; Networks ; Object recognition ; Orientation ; Original Paper ; Pattern Recognition ; Vision systems</subject><ispartof>Machine vision and applications, 2019-09, Vol.30 (6), p.1071-1082</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2019</rights><rights>Machine Vision and Applications is a copyright of Springer, (2019). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-d8ddee7a13900d4b6bd6d45dc139d058bacaed285de45b7abfad0de9ab6437863</citedby><cites>FETCH-LOGICAL-c319t-d8ddee7a13900d4b6bd6d45dc139d058bacaed285de45b7abfad0de9ab6437863</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00138-019-01038-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00138-019-01038-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Yang, Li</creatorcontrib><creatorcontrib>Qi, Zhi</creatorcontrib><creatorcontrib>Liu, Zeheng</creatorcontrib><creatorcontrib>Liu, Hao</creatorcontrib><creatorcontrib>Ling, Ming</creatorcontrib><creatorcontrib>Shi, Longxing</creatorcontrib><creatorcontrib>Liu, Xinning</creatorcontrib><title>An embedded implementation of CNN-based hand detection and orientation estimation algorithm</title><title>Machine vision and applications</title><addtitle>Machine Vision and Applications</addtitle><description>Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at https://github.com/yangli18/hand_detection ) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.</description><subject>Algorithms</subject><subject>Communications Engineering</subject><subject>Computer Science</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Image Processing and Computer Vision</subject><subject>Model accuracy</subject><subject>Networks</subject><subject>Object recognition</subject><subject>Orientation</subject><subject>Original Paper</subject><subject>Pattern Recognition</subject><subject>Vision systems</subject><issn>0932-8092</issn><issn>1432-1769</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9ULFOwzAQtRBIlMIPMEViDpwdJ47HqgKKVJUFJgbLzl3aVE1S7HTg73EbBBvD6Z793rs7PcZuOdxzAPUQAHhWpsB1LIhInrEJl5lIuSr0OZuAjrgELS7ZVQhbAJBKyQn7mHUJtY4QCZOm3e-opW6wQ9N3SV8n89UqdTZEbmM7TJAGqk7c8dX75ldLYWjaEdrdOjLDpr1mF7XdBbr56VP2_vT4Nl-ky9fnl_lsmVYZ10OKZVxOyvJMA6B0hcMCZY5V_EDIS2crSyjKHEnmTllXWwQkbV0hM1UW2ZTdjXP3vv88xEvMtj_4Lq40QiglcqkFjyoxqirfh-CpNnsfT_ZfhoM5hmjGEE0M0ZxCNDKastEUorhbk_8b_Y_rG--QdoE</recordid><startdate>20190901</startdate><enddate>20190901</enddate><creator>Yang, Li</creator><creator>Qi, Zhi</creator><creator>Liu, Zeheng</creator><creator>Liu, Hao</creator><creator>Ling, Ming</creator><creator>Shi, Longxing</creator><creator>Liu, Xinning</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20190901</creationdate><title>An embedded implementation of CNN-based hand detection and orientation estimation algorithm</title><author>Yang, Li ; Qi, Zhi ; Liu, Zeheng ; Liu, Hao ; Ling, Ming ; Shi, Longxing ; Liu, Xinning</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-d8ddee7a13900d4b6bd6d45dc139d058bacaed285de45b7abfad0de9ab6437863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Communications Engineering</topic><topic>Computer Science</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Image Processing and Computer Vision</topic><topic>Model accuracy</topic><topic>Networks</topic><topic>Object recognition</topic><topic>Orientation</topic><topic>Original Paper</topic><topic>Pattern Recognition</topic><topic>Vision systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Li</creatorcontrib><creatorcontrib>Qi, Zhi</creatorcontrib><creatorcontrib>Liu, Zeheng</creatorcontrib><creatorcontrib>Liu, Hao</creatorcontrib><creatorcontrib>Ling, Ming</creatorcontrib><creatorcontrib>Shi, Longxing</creatorcontrib><creatorcontrib>Liu, Xinning</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Advanced Technologies &amp; Aerospace Database</collection><collection>ProQuest Advanced Technologies &amp; Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>Machine vision and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Li</au><au>Qi, Zhi</au><au>Liu, Zeheng</au><au>Liu, Hao</au><au>Ling, Ming</au><au>Shi, Longxing</au><au>Liu, Xinning</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An embedded implementation of CNN-based hand detection and orientation estimation algorithm</atitle><jtitle>Machine vision and applications</jtitle><stitle>Machine Vision and Applications</stitle><date>2019-09-01</date><risdate>2019</risdate><volume>30</volume><issue>6</issue><spage>1071</spage><epage>1082</epage><pages>1071-1082</pages><issn>0932-8092</issn><eissn>1432-1769</eissn><abstract>Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at https://github.com/yangli18/hand_detection ) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00138-019-01038-4</doi><tpages>12</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0932-8092
ispartof Machine vision and applications, 2019-09, Vol.30 (6), p.1071-1082
issn 0932-8092
1432-1769
language eng
recordid cdi_proquest_journals_2277254921
source Springer Nature - Complete Springer Journals
subjects Algorithms
Communications Engineering
Computer Science
Feature extraction
Feature maps
Image Processing and Computer Vision
Model accuracy
Networks
Object recognition
Orientation
Original Paper
Pattern Recognition
Vision systems
title An embedded implementation of CNN-based hand detection and orientation estimation algorithm
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T13%3A32%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20embedded%20implementation%20of%20CNN-based%20hand%20detection%20and%20orientation%20estimation%20algorithm&rft.jtitle=Machine%20vision%20and%20applications&rft.au=Yang,%20Li&rft.date=2019-09-01&rft.volume=30&rft.issue=6&rft.spage=1071&rft.epage=1082&rft.pages=1071-1082&rft.issn=0932-8092&rft.eissn=1432-1769&rft_id=info:doi/10.1007/s00138-019-01038-4&rft_dat=%3Cproquest_cross%3E2277254921%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2277254921&rft_id=info:pmid/&rfr_iscdi=true