LSDNet: lightweight stochastic depth network for human pose estimation: LSDNet: lightweight stochastic depth network for human pose estimation

Human pose estimation plays a critical role in human-centred vision applications. Its influence extends to various aspects of daily life, from healthcare diagnostics and sports training to augmented reality experiences and gesture-controlled interfaces. While current approaches have achieved impress...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Visual computer 2025-01, Vol.41 (1), p.257-270
Hauptverfasser: Zhang, Hengrui, Qi, Yongfeng, Chen, Huili, Cao, Panpan, Liang, Anye, Wen, Shengcong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 270
container_issue 1
container_start_page 257
container_title The Visual computer
container_volume 41
creator Zhang, Hengrui
Qi, Yongfeng
Chen, Huili
Cao, Panpan
Liang, Anye
Wen, Shengcong
description Human pose estimation plays a critical role in human-centred vision applications. Its influence extends to various aspects of daily life, from healthcare diagnostics and sports training to augmented reality experiences and gesture-controlled interfaces. While current approaches have achieved impressive accuracy, their high model complexity and slow detection speeds significantly limit their deployment on edge devices with limited computing power, such as mobile phones and IoT devices. In this paper, we introduce a novel lightweight network for 2D human pose estimation, called lightweight stochastic depth network (LSDNet). Our approach is based on the observation that the majority of HRNet’s parameters are located in the middle and later stages in the network. We reduce some unnecessary branches to significantly reduce these parameters. This is achieved by leveraging the Bernoulli distribution to randomly remove these redundant branches, which improves the network’s efficiency while also increasing its robustness. To further reduce the network’s parameter count, we introduce two lightweight blocks with simple yet effective architectures. These blocks achieve significant parameter reduction while maintaining good accuracy. Furthermore, we leverage coordinate attention to effectively fuse features from different branches and scales. This mechanism captures both inter-channel dependencies and spatial context, enabling the network to accurately localize keypoints across the human body. We evaluated the effectiveness of our method on the MPII and COCO datasets, demonstrating superior results on human pose estimation compared to popular lightweight networks. Our code is available at: https://github.com/illusory2333/LSDNet .
doi_str_mv 10.1007/s00371-024-03323-4
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3159547550</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3159547550</sourcerecordid><originalsourceid>FETCH-LOGICAL-c270t-c5d0512d55255cb9531c83142f87b20f88c5576265264f840ef55e834eae39c93</originalsourceid><addsrcrecordid>eNp9kE9LAzEQxYMoWKtfwFPAc3TyZ5qsN6lWhaIH9Ry2abbb2m5qklL89kZX8OZlhoH33sz8CDnncMkB9FUCkJozEIqBlEIydUAGXEnBhOR4SAbAtWFCm-qYnKS0gjJrVQ3IZPpy--TzNV0vF23e--9KUw6urVNeOjr329zSzud9iO-0CZG2u03d0W1Invoi2dR5GbpTctTU6-TPfvuQvE3uXscPbPp8_zi-mTInNGTmcA7IxRxRILpZhZI7I7kSjdEzAY0xDlGPxAjFSDVGgW8QvZHK115WrpJDctHnbmP42JX9dhV2sSsrbfmzQqURoahEr3IxpBR9Y7exHBo_LQf7zcv2vGzhZX94WVVMsjelIu4WPv5F_-P6Aj4wbEY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3159547550</pqid></control><display><type>article</type><title>LSDNet: lightweight stochastic depth network for human pose estimation: LSDNet: lightweight stochastic depth network for human pose estimation</title><source>Springer Nature - Complete Springer Journals</source><creator>Zhang, Hengrui ; Qi, Yongfeng ; Chen, Huili ; Cao, Panpan ; Liang, Anye ; Wen, Shengcong</creator><creatorcontrib>Zhang, Hengrui ; Qi, Yongfeng ; Chen, Huili ; Cao, Panpan ; Liang, Anye ; Wen, Shengcong</creatorcontrib><description>Human pose estimation plays a critical role in human-centred vision applications. Its influence extends to various aspects of daily life, from healthcare diagnostics and sports training to augmented reality experiences and gesture-controlled interfaces. While current approaches have achieved impressive accuracy, their high model complexity and slow detection speeds significantly limit their deployment on edge devices with limited computing power, such as mobile phones and IoT devices. In this paper, we introduce a novel lightweight network for 2D human pose estimation, called lightweight stochastic depth network (LSDNet). Our approach is based on the observation that the majority of HRNet’s parameters are located in the middle and later stages in the network. We reduce some unnecessary branches to significantly reduce these parameters. This is achieved by leveraging the Bernoulli distribution to randomly remove these redundant branches, which improves the network’s efficiency while also increasing its robustness. To further reduce the network’s parameter count, we introduce two lightweight blocks with simple yet effective architectures. These blocks achieve significant parameter reduction while maintaining good accuracy. Furthermore, we leverage coordinate attention to effectively fuse features from different branches and scales. This mechanism captures both inter-channel dependencies and spatial context, enabling the network to accurately localize keypoints across the human body. We evaluated the effectiveness of our method on the MPII and COCO datasets, demonstrating superior results on human pose estimation compared to popular lightweight networks. Our code is available at: https://github.com/illusory2333/LSDNet .</description><identifier>ISSN: 0178-2789</identifier><identifier>EISSN: 1432-2315</identifier><identifier>DOI: 10.1007/s00371-024-03323-4</identifier><language>eng</language><publisher>Berlin/Heidelberg: Springer Berlin Heidelberg</publisher><subject>Accuracy ; Artificial Intelligence ; Augmented reality ; Computer Graphics ; Computer Science ; Effectiveness ; Image Processing and Computer Vision ; Lightweight ; Methods ; Neural networks ; Parameter robustness ; Pose estimation ; Semantics ; Weight reduction</subject><ispartof>The Visual computer, 2025-01, Vol.41 (1), p.257-270</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag GmbH Germany, part of Springer Nature 2024 Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><rights>Copyright Springer Nature B.V. Jan 2025</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c270t-c5d0512d55255cb9531c83142f87b20f88c5576265264f840ef55e834eae39c93</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s00371-024-03323-4$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s00371-024-03323-4$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Zhang, Hengrui</creatorcontrib><creatorcontrib>Qi, Yongfeng</creatorcontrib><creatorcontrib>Chen, Huili</creatorcontrib><creatorcontrib>Cao, Panpan</creatorcontrib><creatorcontrib>Liang, Anye</creatorcontrib><creatorcontrib>Wen, Shengcong</creatorcontrib><title>LSDNet: lightweight stochastic depth network for human pose estimation: LSDNet: lightweight stochastic depth network for human pose estimation</title><title>The Visual computer</title><addtitle>Vis Comput</addtitle><description>Human pose estimation plays a critical role in human-centred vision applications. Its influence extends to various aspects of daily life, from healthcare diagnostics and sports training to augmented reality experiences and gesture-controlled interfaces. While current approaches have achieved impressive accuracy, their high model complexity and slow detection speeds significantly limit their deployment on edge devices with limited computing power, such as mobile phones and IoT devices. In this paper, we introduce a novel lightweight network for 2D human pose estimation, called lightweight stochastic depth network (LSDNet). Our approach is based on the observation that the majority of HRNet’s parameters are located in the middle and later stages in the network. We reduce some unnecessary branches to significantly reduce these parameters. This is achieved by leveraging the Bernoulli distribution to randomly remove these redundant branches, which improves the network’s efficiency while also increasing its robustness. To further reduce the network’s parameter count, we introduce two lightweight blocks with simple yet effective architectures. These blocks achieve significant parameter reduction while maintaining good accuracy. Furthermore, we leverage coordinate attention to effectively fuse features from different branches and scales. This mechanism captures both inter-channel dependencies and spatial context, enabling the network to accurately localize keypoints across the human body. We evaluated the effectiveness of our method on the MPII and COCO datasets, demonstrating superior results on human pose estimation compared to popular lightweight networks. Our code is available at: https://github.com/illusory2333/LSDNet .</description><subject>Accuracy</subject><subject>Artificial Intelligence</subject><subject>Augmented reality</subject><subject>Computer Graphics</subject><subject>Computer Science</subject><subject>Effectiveness</subject><subject>Image Processing and Computer Vision</subject><subject>Lightweight</subject><subject>Methods</subject><subject>Neural networks</subject><subject>Parameter robustness</subject><subject>Pose estimation</subject><subject>Semantics</subject><subject>Weight reduction</subject><issn>0178-2789</issn><issn>1432-2315</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2025</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LAzEQxYMoWKtfwFPAc3TyZ5qsN6lWhaIH9Ry2abbb2m5qklL89kZX8OZlhoH33sz8CDnncMkB9FUCkJozEIqBlEIydUAGXEnBhOR4SAbAtWFCm-qYnKS0gjJrVQ3IZPpy--TzNV0vF23e--9KUw6urVNeOjr329zSzud9iO-0CZG2u03d0W1Invoi2dR5GbpTctTU6-TPfvuQvE3uXscPbPp8_zi-mTInNGTmcA7IxRxRILpZhZI7I7kSjdEzAY0xDlGPxAjFSDVGgW8QvZHK115WrpJDctHnbmP42JX9dhV2sSsrbfmzQqURoahEr3IxpBR9Y7exHBo_LQf7zcv2vGzhZX94WVVMsjelIu4WPv5F_-P6Aj4wbEY</recordid><startdate>20250101</startdate><enddate>20250101</enddate><creator>Zhang, Hengrui</creator><creator>Qi, Yongfeng</creator><creator>Chen, Huili</creator><creator>Cao, Panpan</creator><creator>Liang, Anye</creator><creator>Wen, Shengcong</creator><general>Springer Berlin Heidelberg</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope><scope>JQ2</scope></search><sort><creationdate>20250101</creationdate><title>LSDNet: lightweight stochastic depth network for human pose estimation</title><author>Zhang, Hengrui ; Qi, Yongfeng ; Chen, Huili ; Cao, Panpan ; Liang, Anye ; Wen, Shengcong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c270t-c5d0512d55255cb9531c83142f87b20f88c5576265264f840ef55e834eae39c93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2025</creationdate><topic>Accuracy</topic><topic>Artificial Intelligence</topic><topic>Augmented reality</topic><topic>Computer Graphics</topic><topic>Computer Science</topic><topic>Effectiveness</topic><topic>Image Processing and Computer Vision</topic><topic>Lightweight</topic><topic>Methods</topic><topic>Neural networks</topic><topic>Parameter robustness</topic><topic>Pose estimation</topic><topic>Semantics</topic><topic>Weight reduction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Hengrui</creatorcontrib><creatorcontrib>Qi, Yongfeng</creatorcontrib><creatorcontrib>Chen, Huili</creatorcontrib><creatorcontrib>Cao, Panpan</creatorcontrib><creatorcontrib>Liang, Anye</creatorcontrib><creatorcontrib>Wen, Shengcong</creatorcontrib><collection>CrossRef</collection><collection>ProQuest Computer Science Collection</collection><jtitle>The Visual computer</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Zhang, Hengrui</au><au>Qi, Yongfeng</au><au>Chen, Huili</au><au>Cao, Panpan</au><au>Liang, Anye</au><au>Wen, Shengcong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>LSDNet: lightweight stochastic depth network for human pose estimation: LSDNet: lightweight stochastic depth network for human pose estimation</atitle><jtitle>The Visual computer</jtitle><stitle>Vis Comput</stitle><date>2025-01-01</date><risdate>2025</risdate><volume>41</volume><issue>1</issue><spage>257</spage><epage>270</epage><pages>257-270</pages><issn>0178-2789</issn><eissn>1432-2315</eissn><abstract>Human pose estimation plays a critical role in human-centred vision applications. Its influence extends to various aspects of daily life, from healthcare diagnostics and sports training to augmented reality experiences and gesture-controlled interfaces. While current approaches have achieved impressive accuracy, their high model complexity and slow detection speeds significantly limit their deployment on edge devices with limited computing power, such as mobile phones and IoT devices. In this paper, we introduce a novel lightweight network for 2D human pose estimation, called lightweight stochastic depth network (LSDNet). Our approach is based on the observation that the majority of HRNet’s parameters are located in the middle and later stages in the network. We reduce some unnecessary branches to significantly reduce these parameters. This is achieved by leveraging the Bernoulli distribution to randomly remove these redundant branches, which improves the network’s efficiency while also increasing its robustness. To further reduce the network’s parameter count, we introduce two lightweight blocks with simple yet effective architectures. These blocks achieve significant parameter reduction while maintaining good accuracy. Furthermore, we leverage coordinate attention to effectively fuse features from different branches and scales. This mechanism captures both inter-channel dependencies and spatial context, enabling the network to accurately localize keypoints across the human body. We evaluated the effectiveness of our method on the MPII and COCO datasets, demonstrating superior results on human pose estimation compared to popular lightweight networks. Our code is available at: https://github.com/illusory2333/LSDNet .</abstract><cop>Berlin/Heidelberg</cop><pub>Springer Berlin Heidelberg</pub><doi>10.1007/s00371-024-03323-4</doi><tpages>14</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0178-2789
ispartof The Visual computer, 2025-01, Vol.41 (1), p.257-270
issn 0178-2789
1432-2315
language eng
recordid cdi_proquest_journals_3159547550
source Springer Nature - Complete Springer Journals
subjects Accuracy
Artificial Intelligence
Augmented reality
Computer Graphics
Computer Science
Effectiveness
Image Processing and Computer Vision
Lightweight
Methods
Neural networks
Parameter robustness
Pose estimation
Semantics
Weight reduction
title LSDNet: lightweight stochastic depth network for human pose estimation: LSDNet: lightweight stochastic depth network for human pose estimation
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T19%3A58%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=LSDNet:%20lightweight%20stochastic%20depth%20network%20for%20human%20pose%20estimation:%20LSDNet:%20lightweight%20stochastic%20depth%20network%20for%20human%20pose%20estimation&rft.jtitle=The%20Visual%20computer&rft.au=Zhang,%20Hengrui&rft.date=2025-01-01&rft.volume=41&rft.issue=1&rft.spage=257&rft.epage=270&rft.pages=257-270&rft.issn=0178-2789&rft.eissn=1432-2315&rft_id=info:doi/10.1007/s00371-024-03323-4&rft_dat=%3Cproquest_cross%3E3159547550%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3159547550&rft_id=info:pmid/&rfr_iscdi=true