LSDNet: lightweight stochastic depth network for human pose estimation: LSDNet: lightweight stochastic depth network for human pose estimation

Human pose estimation plays a critical role in human-centred vision applications. Its influence extends to various aspects of daily life, from healthcare diagnostics and sports training to augmented reality experiences and gesture-controlled interfaces. While current approaches have achieved impress...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Visual computer 2025-01, Vol.41 (1), p.257-270
Hauptverfasser:	Zhang, Hengrui, Qi, Yongfeng, Chen, Huili, Cao, Panpan, Liang, Anye, Wen, Shengcong
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Artificial Intelligence Augmented reality Computer Graphics Computer Science Effectiveness Image Processing and Computer Vision Lightweight Methods Neural networks Parameter robustness Pose estimation Semantics Weight reduction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	270
container_issue	1
container_start_page	257
container_title	The Visual computer
container_volume	41
creator	Zhang, Hengrui Qi, Yongfeng Chen, Huili Cao, Panpan Liang, Anye Wen, Shengcong
description	Human pose estimation plays a critical role in human-centred vision applications. Its influence extends to various aspects of daily life, from healthcare diagnostics and sports training to augmented reality experiences and gesture-controlled interfaces. While current approaches have achieved impressive accuracy, their high model complexity and slow detection speeds significantly limit their deployment on edge devices with limited computing power, such as mobile phones and IoT devices. In this paper, we introduce a novel lightweight network for 2D human pose estimation, called lightweight stochastic depth network (LSDNet). Our approach is based on the observation that the majority of HRNet’s parameters are located in the middle and later stages in the network. We reduce some unnecessary branches to significantly reduce these parameters. This is achieved by leveraging the Bernoulli distribution to randomly remove these redundant branches, which improves the network’s efficiency while also increasing its robustness. To further reduce the network’s parameter count, we introduce two lightweight blocks with simple yet effective architectures. These blocks achieve significant parameter reduction while maintaining good accuracy. Furthermore, we leverage coordinate attention to effectively fuse features from different branches and scales. This mechanism captures both inter-channel dependencies and spatial context, enabling the network to accurately localize keypoints across the human body. We evaluated the effectiveness of our method on the MPII and COCO datasets, demonstrating superior results on human pose estimation compared to popular lightweight networks. Our code is available at: https://github.com/illusory2333/LSDNet .
doi_str_mv	10.1007/s00371-024-03323-4
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3159547550</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3159547550</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-c5d0512d55255cb9531c83142f87b20f88c5576265264f840ef55e834eae39c93</originalsourceid><addsrcrecordid>eNp9kE9LAzEQxYMoWKtfwFPAc3TyZ5qsN6lWhaIH9Ry2abbb2m5qklL89kZX8OZlhoH33sz8CDnncMkB9FUCkJozEIqBlEIydUAGXEnBhOR4SAbAtWFCm-qYnKS0gjJrVQ3IZPpy--TzNV0vF23e--9KUw6urVNeOjr329zSzud9iO-0CZG2u03d0W1Invoi2dR5GbpTctTU6-TPfvuQvE3uXscPbPp8_zi-mTInNGTmcA7IxRxRILpZhZI7I7kSjdEzAY0xDlGPxAjFSDVGgW8QvZHK115WrpJDctHnbmP42JX9dhV2sSsrbfmzQqURoahEr3IxpBR9Y7exHBo_LQf7zcv2vGzhZX94WVVMsjelIu4WPv5F_-P6Aj4wbEY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3159547550</pqid></control><display><type>article</type><title>LSDNet: lightweight stochastic depth network for human pose estimation: LSDNet: lightweight stochastic depth network for human pose estimation</title><source>Springer Nature - Complete Springer Journals</source><creator>Zhang, Hengrui ; Qi, Yongfeng ; Chen, Huili ; Cao, Panpan ; Liang, Anye ; Wen, Shengcong</creator><creatorcontrib>Zhang, Hengrui ; Qi, Yongfeng ; Chen, Huili ; Cao, Panpan ; Liang, Anye ; Wen, Shengcong</creatorcontrib><description>Human pose estimation plays a critical role in human-centred vision applications. Its influence extends to various aspects of daily life, from healthcare diagnostics and sports training to augmented reality experiences and gesture-controlled interfaces. While current approaches have achieved impressive accuracy, their high model complexity and slow detection speeds significantly limit their deployment on edge devices with limited computing power, such as mobile phones and IoT devices. In this paper, we introduce a novel lightweight network for 2D human pose estimation, called lightweight stochastic depth network (LSDNet). Our approach is based on the observation that the majority of HRNet’s parameters are located in the middle and later stages in the network. We reduce some unnecessary branches to significantly reduce these parameters. This is achieved by leveraging the Bernoulli distribution to randomly remove these redundant branches, which improves the network’s efficiency while also increasing its robustness. To further reduce the network’s parameter count, we introduce two lightweight blocks with simple yet effective architectures. These blocks achieve significant parameter reduction while maintaining good accuracy. Furthermore, we leverage coordinate attention to effectively fuse features from different branches and scales. This mechanism captures both inter-channel dependencies and spatial context, enabling the network to accurately localize keypoints across the human body. We evaluated the effectiveness of our method on the MPII and COCO datasets, demonstrating superior results on human pose estimation compared to popular lightweight networks. Our code is available at: https://github.com/illusory2333/LSDNet .</description><identifier>ISSN: 0178-2789</identifier><identifier>EISSN: 1432-2315</identifier><identifier>DOI: 10.1007/s00371-024-03323-4</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Accuracy ; Artificial Intelligence ; Augmented reality ; Computer Graphics ; Computer Science ; Effectiveness ; Image Processing and Computer Vision ; Lightweight ; Methods ; Neural networks ; Parameter robustness ; Pose estimation ; Semantics ; Weight reduction</subject><ispartof>The Visual computer, 2025-01, Vol.41 (1), p.257-270</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024 Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>Copyright Springer Nature B.V. Jan 2025</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-c5d0512d55255cb9531c83142f87b20f88c5576265264f840ef55e834eae39c93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00371-024-03323-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00371-024-03323-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Zhang, Hengrui</creatorcontrib><creatorcontrib>Qi, Yongfeng</creatorcontrib><creatorcontrib>Chen, Huili</creatorcontrib><creatorcontrib>Cao, Panpan</creatorcontrib><creatorcontrib>Liang, Anye</creatorcontrib><creatorcontrib>Wen, Shengcong</creatorcontrib><title>LSDNet: lightweight stochastic depth network for human pose estimation: LSDNet: lightweight stochastic depth network for human pose estimation</title><title>The Visual computer</title><addtitle>Vis Comput</addtitle><description>Human pose estimation plays a critical role in human-centred vision applications. Its influence extends to various aspects of daily life, from healthcare diagnostics and sports training to augmented reality experiences and gesture-controlled interfaces. While current approaches have achieved impressive accuracy, their high model complexity and slow detection speeds significantly limit their deployment on edge devices with limited computing power, such as mobile phones and IoT devices. In this paper, we introduce a novel lightweight network for 2D human pose estimation, called lightweight stochastic depth network (LSDNet). Our approach is based on the observation that the majority of HRNet’s parameters are located in the middle and later stages in the network. We reduce some unnecessary branches to significantly reduce these parameters. This is achieved by leveraging the Bernoulli distribution to randomly remove these redundant branches, which improves the network’s efficiency while also increasing its robustness. To further reduce the network’s parameter count, we introduce two lightweight blocks with simple yet effective architectures. These blocks achieve significant parameter reduction while maintaining good accuracy. Furthermore, we leverage coordinate attention to effectively fuse features from different branches and scales. This mechanism captures both inter-channel dependencies and spatial context, enabling the network to accurately localize keypoints across the human body. We evaluated the effectiveness of our method on the MPII and COCO datasets, demonstrating superior results on human pose estimation compared to popular lightweight networks. Our code is available at: https://github.com/illusory2333/LSDNet .</description><subject>Accuracy</subject><subject>Artificial Intelligence</subject><subject>Augmented reality</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Effectiveness</subject><subject>Image Processing and Computer Vision</subject><subject>Lightweight</subject><subject>Methods</subject><subject>Neural networks</subject><subject>Parameter robustness</subject><subject>Pose estimation</subject><subject>Semantics</subject><subject>Weight reduction</subject><issn>0178-2789</issn><issn>1432-2315</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LAzEQxYMoWKtfwFPAc3TyZ5qsN6lWhaIH9Ry2abbb2m5qklL89kZX8OZlhoH33sz8CDnncMkB9FUCkJozEIqBlEIydUAGXEnBhOR4SAbAtWFCm-qYnKS0gjJrVQ3IZPpy--TzNV0vF23e--9KUw6urVNeOjr329zSzud9iO-0CZG2u03d0W1Invoi2dR5GbpTctTU6-TPfvuQvE3uXscPbPp8_zi-mTInNGTmcA7IxRxRILpZhZI7I7kSjdEzAY0xDlGPxAjFSDVGgW8QvZHK115WrpJDctHnbmP42JX9dhV2sSsrbfmzQqURoahEr3IxpBR9Y7exHBo_LQf7zcv2vGzhZX94WVVMsjelIu4WPv5F_-P6Aj4wbEY</recordid><startdate>20250101</startdate><enddate>20250101</enddate><creator>Zhang, Hengrui</creator><creator>Qi, Yongfeng</creator><creator>Chen, Huili</creator><creator>Cao, Panpan</creator><creator>Liang, Anye</creator><creator>Wen, Shengcong</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope></search><sort><creationdate>20250101</creationdate><title>LSDNet: lightweight stochastic depth network for human pose estimation</title><author>Zhang, Hengrui ; Qi, Yongfeng ; Chen, Huili ; Cao, Panpan ; Liang, Anye ; Wen, Shengcong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-c5d0512d55255cb9531c83142f87b20f88c5576265264f840ef55e834eae39c93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Accuracy</topic><topic>Artificial Intelligence</topic><topic>Augmented reality</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Effectiveness</topic><topic>Image Processing and Computer Vision</topic><topic>Lightweight</topic><topic>Methods</topic><topic>Neural networks</topic><topic>Parameter robustness</topic><topic>Pose estimation</topic><topic>Semantics</topic><topic>Weight reduction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Hengrui</creatorcontrib><creatorcontrib>Qi, Yongfeng</creatorcontrib><creatorcontrib>Chen, Huili</creatorcontrib><creatorcontrib>Cao, Panpan</creatorcontrib><creatorcontrib>Liang, Anye</creatorcontrib><creatorcontrib>Wen, Shengcong</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>The Visual computer</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Hengrui</au><au>Qi, Yongfeng</au><au>Chen, Huili</au><au>Cao, Panpan</au><au>Liang, Anye</au><au>Wen, Shengcong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LSDNet: lightweight stochastic depth network for human pose estimation: LSDNet: lightweight stochastic depth network for human pose estimation</atitle><jtitle>The Visual computer</jtitle><stitle>Vis Comput</stitle><date>2025-01-01</date><risdate>2025</risdate><volume>41</volume><issue>1</issue><spage>257</spage><epage>270</epage><pages>257-270</pages><issn>0178-2789</issn><eissn>1432-2315</eissn><abstract>Human pose estimation plays a critical role in human-centred vision applications. Its influence extends to various aspects of daily life, from healthcare diagnostics and sports training to augmented reality experiences and gesture-controlled interfaces. While current approaches have achieved impressive accuracy, their high model complexity and slow detection speeds significantly limit their deployment on edge devices with limited computing power, such as mobile phones and IoT devices. In this paper, we introduce a novel lightweight network for 2D human pose estimation, called lightweight stochastic depth network (LSDNet). Our approach is based on the observation that the majority of HRNet’s parameters are located in the middle and later stages in the network. We reduce some unnecessary branches to significantly reduce these parameters. This is achieved by leveraging the Bernoulli distribution to randomly remove these redundant branches, which improves the network’s efficiency while also increasing its robustness. To further reduce the network’s parameter count, we introduce two lightweight blocks with simple yet effective architectures. These blocks achieve significant parameter reduction while maintaining good accuracy. Furthermore, we leverage coordinate attention to effectively fuse features from different branches and scales. This mechanism captures both inter-channel dependencies and spatial context, enabling the network to accurately localize keypoints across the human body. We evaluated the effectiveness of our method on the MPII and COCO datasets, demonstrating superior results on human pose estimation compared to popular lightweight networks. Our code is available at: https://github.com/illusory2333/LSDNet .</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00371-024-03323-4</doi><tpages>14</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0178-2789
ispartof	The Visual computer, 2025-01, Vol.41 (1), p.257-270
issn	0178-2789 1432-2315
language	eng
recordid	cdi_proquest_journals_3159547550
source	Springer Nature - Complete Springer Journals
subjects	Accuracy Artificial Intelligence Augmented reality Computer Graphics Computer Science Effectiveness Image Processing and Computer Vision Lightweight Methods Neural networks Parameter robustness Pose estimation Semantics Weight reduction
title	LSDNet: lightweight stochastic depth network for human pose estimation: LSDNet: lightweight stochastic depth network for human pose estimation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T19%3A58%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LSDNet:%20lightweight%20stochastic%20depth%20network%20for%20human%20pose%20estimation:%20LSDNet:%20lightweight%20stochastic%20depth%20network%20for%20human%20pose%20estimation&rft.jtitle=The%20Visual%20computer&rft.au=Zhang,%20Hengrui&rft.date=2025-01-01&rft.volume=41&rft.issue=1&rft.spage=257&rft.epage=270&rft.pages=257-270&rft.issn=0178-2789&rft.eissn=1432-2315&rft_id=info:doi/10.1007/s00371-024-03323-4&rft_dat=%3Cproquest_cross%3E3159547550%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3159547550&rft_id=info:pmid/&rfr_iscdi=true