An embedded implementation of CNN-based hand detection and orientation estimation algorithm
Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significan...
Gespeichert in:
Veröffentlicht in: | Machine vision and applications 2019-09, Vol.30 (6), p.1071-1082 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1082 |
---|---|
container_issue | 6 |
container_start_page | 1071 |
container_title | Machine vision and applications |
container_volume | 30 |
creator | Yang, Li Qi, Zhi Liu, Zeheng Liu, Hao Ling, Ming Shi, Longxing Liu, Xinning |
description | Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at
https://github.com/yangli18/hand_detection
) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing. |
doi_str_mv | 10.1007/s00138-019-01038-4 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2277254921</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2277254921</sourcerecordid><originalsourceid>FETCH-LOGICAL-c319t-d8ddee7a13900d4b6bd6d45dc139d058bacaed285de45b7abfad0de9ab6437863</originalsourceid><addsrcrecordid>eNp9ULFOwzAQtRBIlMIPMEViDpwdJ47HqgKKVJUFJgbLzl3aVE1S7HTg73EbBBvD6Z793rs7PcZuOdxzAPUQAHhWpsB1LIhInrEJl5lIuSr0OZuAjrgELS7ZVQhbAJBKyQn7mHUJtY4QCZOm3e-opW6wQ9N3SV8n89UqdTZEbmM7TJAGqk7c8dX75ldLYWjaEdrdOjLDpr1mF7XdBbr56VP2_vT4Nl-ky9fnl_lsmVYZ10OKZVxOyvJMA6B0hcMCZY5V_EDIS2crSyjKHEnmTllXWwQkbV0hM1UW2ZTdjXP3vv88xEvMtj_4Lq40QiglcqkFjyoxqirfh-CpNnsfT_ZfhoM5hmjGEE0M0ZxCNDKastEUorhbk_8b_Y_rG--QdoE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2277254921</pqid></control><display><type>article</type><title>An embedded implementation of CNN-based hand detection and orientation estimation algorithm</title><source>Springer Nature - Complete Springer Journals</source><creator>Yang, Li ; Qi, Zhi ; Liu, Zeheng ; Liu, Hao ; Ling, Ming ; Shi, Longxing ; Liu, Xinning</creator><creatorcontrib>Yang, Li ; Qi, Zhi ; Liu, Zeheng ; Liu, Hao ; Ling, Ming ; Shi, Longxing ; Liu, Xinning</creatorcontrib><description>Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at
https://github.com/yangli18/hand_detection
) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.</description><identifier>ISSN: 0932-8092</identifier><identifier>EISSN: 1432-1769</identifier><identifier>DOI: 10.1007/s00138-019-01038-4</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Algorithms ; Communications Engineering ; Computer Science ; Feature extraction ; Feature maps ; Image Processing and Computer Vision ; Model accuracy ; Networks ; Object recognition ; Orientation ; Original Paper ; Pattern Recognition ; Vision systems</subject><ispartof>Machine vision and applications, 2019-09, Vol.30 (6), p.1071-1082</ispartof><rights>Springer-Verlag GmbH Germany, part of Springer Nature 2019</rights><rights>Machine Vision and Applications is a copyright of Springer, (2019). All Rights Reserved.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c319t-d8ddee7a13900d4b6bd6d45dc139d058bacaed285de45b7abfad0de9ab6437863</citedby><cites>FETCH-LOGICAL-c319t-d8ddee7a13900d4b6bd6d45dc139d058bacaed285de45b7abfad0de9ab6437863</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00138-019-01038-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00138-019-01038-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Yang, Li</creatorcontrib><creatorcontrib>Qi, Zhi</creatorcontrib><creatorcontrib>Liu, Zeheng</creatorcontrib><creatorcontrib>Liu, Hao</creatorcontrib><creatorcontrib>Ling, Ming</creatorcontrib><creatorcontrib>Shi, Longxing</creatorcontrib><creatorcontrib>Liu, Xinning</creatorcontrib><title>An embedded implementation of CNN-based hand detection and orientation estimation algorithm</title><title>Machine vision and applications</title><addtitle>Machine Vision and Applications</addtitle><description>Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at
https://github.com/yangli18/hand_detection
) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.</description><subject>Algorithms</subject><subject>Communications Engineering</subject><subject>Computer Science</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Image Processing and Computer Vision</subject><subject>Model accuracy</subject><subject>Networks</subject><subject>Object recognition</subject><subject>Orientation</subject><subject>Original Paper</subject><subject>Pattern Recognition</subject><subject>Vision systems</subject><issn>0932-8092</issn><issn>1432-1769</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNp9ULFOwzAQtRBIlMIPMEViDpwdJ47HqgKKVJUFJgbLzl3aVE1S7HTg73EbBBvD6Z793rs7PcZuOdxzAPUQAHhWpsB1LIhInrEJl5lIuSr0OZuAjrgELS7ZVQhbAJBKyQn7mHUJtY4QCZOm3e-opW6wQ9N3SV8n89UqdTZEbmM7TJAGqk7c8dX75ldLYWjaEdrdOjLDpr1mF7XdBbr56VP2_vT4Nl-ky9fnl_lsmVYZ10OKZVxOyvJMA6B0hcMCZY5V_EDIS2crSyjKHEnmTllXWwQkbV0hM1UW2ZTdjXP3vv88xEvMtj_4Lq40QiglcqkFjyoxqirfh-CpNnsfT_ZfhoM5hmjGEE0M0ZxCNDKastEUorhbk_8b_Y_rG--QdoE</recordid><startdate>20190901</startdate><enddate>20190901</enddate><creator>Yang, Li</creator><creator>Qi, Zhi</creator><creator>Liu, Zeheng</creator><creator>Liu, Hao</creator><creator>Ling, Ming</creator><creator>Shi, Longxing</creator><creator>Liu, Xinning</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>P5Z</scope><scope>P62</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20190901</creationdate><title>An embedded implementation of CNN-based hand detection and orientation estimation algorithm</title><author>Yang, Li ; Qi, Zhi ; Liu, Zeheng ; Liu, Hao ; Ling, Ming ; Shi, Longxing ; Liu, Xinning</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c319t-d8ddee7a13900d4b6bd6d45dc139d058bacaed285de45b7abfad0de9ab6437863</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Algorithms</topic><topic>Communications Engineering</topic><topic>Computer Science</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Image Processing and Computer Vision</topic><topic>Model accuracy</topic><topic>Networks</topic><topic>Object recognition</topic><topic>Orientation</topic><topic>Original Paper</topic><topic>Pattern Recognition</topic><topic>Vision systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Yang, Li</creatorcontrib><creatorcontrib>Qi, Zhi</creatorcontrib><creatorcontrib>Liu, Zeheng</creatorcontrib><creatorcontrib>Liu, Hao</creatorcontrib><creatorcontrib>Ling, Ming</creatorcontrib><creatorcontrib>Shi, Longxing</creatorcontrib><creatorcontrib>Liu, Xinning</creatorcontrib><collection>CrossRef</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><jtitle>Machine vision and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Li</au><au>Qi, Zhi</au><au>Liu, Zeheng</au><au>Liu, Hao</au><au>Ling, Ming</au><au>Shi, Longxing</au><au>Liu, Xinning</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>An embedded implementation of CNN-based hand detection and orientation estimation algorithm</atitle><jtitle>Machine vision and applications</jtitle><stitle>Machine Vision and Applications</stitle><date>2019-09-01</date><risdate>2019</risdate><volume>30</volume><issue>6</issue><spage>1071</spage><epage>1082</epage><pages>1071-1082</pages><issn>0932-8092</issn><eissn>1432-1769</eissn><abstract>Hand detection is an essential step to support many tasks including HCI applications. However, detecting various hands robustly under conditions of cluttered backgrounds, motion blur or changing light is still a challenging problem. Recently, object detection methods using CNN models have significantly improved the accuracy of hand detection yet at a high computational expense. In this paper, we propose a light CNN network, which uses a modified MobileNet as the feature extractor in company with the SSD framework to achieve robust and fast detection of hand location and orientation. The network generates a set of feature maps of various resolutions to detect hands of different sizes. In order to improve the robustness, we also employ a top-down feature fusion architecture that integrates context information across levels of features. For an accurate estimation of hand orientation by CNN, we manage to estimate two orthogonal vectors’ projections along the horizontal and vertical axes and then recover the size and orientation of a bounding box exactly enclosing the hand. In order to deploy the detection algorithm on embedded platform Jetson TK1, we optimize the implementations of the building modules in the CNN network. Evaluated on the challenging Oxford hand dataset, our method (the code is available at
https://github.com/yangli18/hand_detection
) reaches 83.2% average precision at 139 FPS on a NVIDIA Titan X, outperforming the previous methods both in accuracy and efficiency. The embedded implementation of our algorithm has reached the processing speed of 16 FPS, which basically meets the requirement of real-time processing.</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00138-019-01038-4</doi><tpages>12</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 0932-8092 |
ispartof | Machine vision and applications, 2019-09, Vol.30 (6), p.1071-1082 |
issn | 0932-8092 1432-1769 |
language | eng |
recordid | cdi_proquest_journals_2277254921 |
source | Springer Nature - Complete Springer Journals |
subjects | Algorithms Communications Engineering Computer Science Feature extraction Feature maps Image Processing and Computer Vision Model accuracy Networks Object recognition Orientation Original Paper Pattern Recognition Vision systems |
title | An embedded implementation of CNN-based hand detection and orientation estimation algorithm |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-30T13%3A32%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=An%20embedded%20implementation%20of%20CNN-based%20hand%20detection%20and%20orientation%20estimation%20algorithm&rft.jtitle=Machine%20vision%20and%20applications&rft.au=Yang,%20Li&rft.date=2019-09-01&rft.volume=30&rft.issue=6&rft.spage=1071&rft.epage=1082&rft.pages=1071-1082&rft.issn=0932-8092&rft.eissn=1432-1769&rft_id=info:doi/10.1007/s00138-019-01038-4&rft_dat=%3Cproquest_cross%3E2277254921%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2277254921&rft_id=info:pmid/&rfr_iscdi=true |