HDA-pose: a real-time 2D human pose estimation method based on modified YOLOv8

2D human pose estimation aims to accurately regress the keypoints of human body from images or videos. However, it remains challenging due to the occlusion and intersection among multiple individuals and the difficulty of dealing with different body scales. In order to better tackle these issues, we...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal, image and video processing image and video processing, 2024-09, Vol.18 (8-9), p.5823-5839
Hauptverfasser: Dong, Chengang, Tang, Yuhao, Zhang, Liyan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 5839
container_issue 8-9
container_start_page 5823
container_title Signal, image and video processing
container_volume 18
creator Dong, Chengang
Tang, Yuhao
Zhang, Liyan
description 2D human pose estimation aims to accurately regress the keypoints of human body from images or videos. However, it remains challenging due to the occlusion and intersection among multiple individuals and the difficulty of dealing with different body scales. In order to better tackle these issues, we propose a human pose estimation framework named HDA-Pose. By improving the real-time framework of YOLOv8, we achieve simultaneous regression of all individuals' keypoint locations in the image. Specifically, we propose the High-Grade Dual Attention (HDA) module to further enhance the focus of YOLOv8 on important features of individuals in the image. Additionally, we improve the original data augmentation strategy in YOLOv8 to better simulate cases where key points of individuals are occluded in the image. Lastly, we introduce a novel regression loss metric, Vertex Intersection over Union, to further enhance the effectiveness of the model in multi-person pose estimation. Our approach attains competitive results on multiple metrics of two open-source datasets, MS COCO 2017 and CrowdPose. Compared with the baseline model YOLOv8x-pose, HDA-Pose improves the average precision by 2.9% and 3.3% on the two datasets, respectively.
doi_str_mv 10.1007/s11760-024-03274-2
format Article
fullrecord <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3086029658</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3086029658</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-88de191467af8185ab01745f3c75074711611bb1c3f3695dade012a19deb697f3</originalsourceid><addsrcrecordid>eNp9UD1PwzAQtRBIVKV_gMkSs-HOTmyHrWqBIlV0gYHJcmKHpmqTYqdI_HtcgmDjlvt67z4eIZcI1wigbiKiksCAZwwEVxnjJ2SEWgqGCvH0NwZxTiYxbiBZwmmpR-RpMZ-yfRf9LbU0eLtlfbPzlM_p-rCzLT22qI-paPuma-nO9-vO0dJG7-gx71xTNyl-XS1XH_qCnNV2G_3kx4_Jy_3d82zBlquHx9l0ySoO0DOtnccCM6lsrVHntgRUWV6LSuWgsnS1RCxLrEQtZJE76zwgt1g4X8pC1WJMroa5-9C9H9J9ZtMdQptWGgFaAi9krhOKD6gqdDEGX5t9SI-ET4NgjtKZQTqTpDPf0hmeSGIgxQRu33z4G_0P6wsVgW4f</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3086029658</pqid></control><display><type>article</type><title>HDA-pose: a real-time 2D human pose estimation method based on modified YOLOv8</title><source>SpringerLink Journals - AutoHoldings</source><creator>Dong, Chengang ; Tang, Yuhao ; Zhang, Liyan</creator><creatorcontrib>Dong, Chengang ; Tang, Yuhao ; Zhang, Liyan</creatorcontrib><description>2D human pose estimation aims to accurately regress the keypoints of human body from images or videos. However, it remains challenging due to the occlusion and intersection among multiple individuals and the difficulty of dealing with different body scales. In order to better tackle these issues, we propose a human pose estimation framework named HDA-Pose. By improving the real-time framework of YOLOv8, we achieve simultaneous regression of all individuals' keypoint locations in the image. Specifically, we propose the High-Grade Dual Attention (HDA) module to further enhance the focus of YOLOv8 on important features of individuals in the image. Additionally, we improve the original data augmentation strategy in YOLOv8 to better simulate cases where key points of individuals are occluded in the image. Lastly, we introduce a novel regression loss metric, Vertex Intersection over Union, to further enhance the effectiveness of the model in multi-person pose estimation. Our approach attains competitive results on multiple metrics of two open-source datasets, MS COCO 2017 and CrowdPose. Compared with the baseline model YOLOv8x-pose, HDA-Pose improves the average precision by 2.9% and 3.3% on the two datasets, respectively.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-024-03274-2</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Computer Imaging ; Computer Science ; Data augmentation ; Datasets ; Image enhancement ; Image Processing and Computer Vision ; Multimedia Information Systems ; Occlusion ; Original Paper ; Pattern Recognition and Graphics ; Pose estimation ; Real time ; Regression models ; Signal,Image and Speech Processing ; Two dimensional bodies ; Vision</subject><ispartof>Signal, image and video processing, 2024-09, Vol.18 (8-9), p.5823-5839</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-88de191467af8185ab01745f3c75074711611bb1c3f3695dade012a19deb697f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11760-024-03274-2$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11760-024-03274-2$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Dong, Chengang</creatorcontrib><creatorcontrib>Tang, Yuhao</creatorcontrib><creatorcontrib>Zhang, Liyan</creatorcontrib><title>HDA-pose: a real-time 2D human pose estimation method based on modified YOLOv8</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>2D human pose estimation aims to accurately regress the keypoints of human body from images or videos. However, it remains challenging due to the occlusion and intersection among multiple individuals and the difficulty of dealing with different body scales. In order to better tackle these issues, we propose a human pose estimation framework named HDA-Pose. By improving the real-time framework of YOLOv8, we achieve simultaneous regression of all individuals' keypoint locations in the image. Specifically, we propose the High-Grade Dual Attention (HDA) module to further enhance the focus of YOLOv8 on important features of individuals in the image. Additionally, we improve the original data augmentation strategy in YOLOv8 to better simulate cases where key points of individuals are occluded in the image. Lastly, we introduce a novel regression loss metric, Vertex Intersection over Union, to further enhance the effectiveness of the model in multi-person pose estimation. Our approach attains competitive results on multiple metrics of two open-source datasets, MS COCO 2017 and CrowdPose. Compared with the baseline model YOLOv8x-pose, HDA-Pose improves the average precision by 2.9% and 3.3% on the two datasets, respectively.</description><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Data augmentation</subject><subject>Datasets</subject><subject>Image enhancement</subject><subject>Image Processing and Computer Vision</subject><subject>Multimedia Information Systems</subject><subject>Occlusion</subject><subject>Original Paper</subject><subject>Pattern Recognition and Graphics</subject><subject>Pose estimation</subject><subject>Real time</subject><subject>Regression models</subject><subject>Signal,Image and Speech Processing</subject><subject>Two dimensional bodies</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9UD1PwzAQtRBIVKV_gMkSs-HOTmyHrWqBIlV0gYHJcmKHpmqTYqdI_HtcgmDjlvt67z4eIZcI1wigbiKiksCAZwwEVxnjJ2SEWgqGCvH0NwZxTiYxbiBZwmmpR-RpMZ-yfRf9LbU0eLtlfbPzlM_p-rCzLT22qI-paPuma-nO9-vO0dJG7-gx71xTNyl-XS1XH_qCnNV2G_3kx4_Jy_3d82zBlquHx9l0ySoO0DOtnccCM6lsrVHntgRUWV6LSuWgsnS1RCxLrEQtZJE76zwgt1g4X8pC1WJMroa5-9C9H9J9ZtMdQptWGgFaAi9krhOKD6gqdDEGX5t9SI-ET4NgjtKZQTqTpDPf0hmeSGIgxQRu33z4G_0P6wsVgW4f</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Dong, Chengang</creator><creator>Tang, Yuhao</creator><creator>Zhang, Liyan</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20240901</creationdate><title>HDA-pose: a real-time 2D human pose estimation method based on modified YOLOv8</title><author>Dong, Chengang ; Tang, Yuhao ; Zhang, Liyan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-88de191467af8185ab01745f3c75074711611bb1c3f3695dade012a19deb697f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Data augmentation</topic><topic>Datasets</topic><topic>Image enhancement</topic><topic>Image Processing and Computer Vision</topic><topic>Multimedia Information Systems</topic><topic>Occlusion</topic><topic>Original Paper</topic><topic>Pattern Recognition and Graphics</topic><topic>Pose estimation</topic><topic>Real time</topic><topic>Regression models</topic><topic>Signal,Image and Speech Processing</topic><topic>Two dimensional bodies</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dong, Chengang</creatorcontrib><creatorcontrib>Tang, Yuhao</creatorcontrib><creatorcontrib>Zhang, Liyan</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dong, Chengang</au><au>Tang, Yuhao</au><au>Zhang, Liyan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>HDA-pose: a real-time 2D human pose estimation method based on modified YOLOv8</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2024-09-01</date><risdate>2024</risdate><volume>18</volume><issue>8-9</issue><spage>5823</spage><epage>5839</epage><pages>5823-5839</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>2D human pose estimation aims to accurately regress the keypoints of human body from images or videos. However, it remains challenging due to the occlusion and intersection among multiple individuals and the difficulty of dealing with different body scales. In order to better tackle these issues, we propose a human pose estimation framework named HDA-Pose. By improving the real-time framework of YOLOv8, we achieve simultaneous regression of all individuals' keypoint locations in the image. Specifically, we propose the High-Grade Dual Attention (HDA) module to further enhance the focus of YOLOv8 on important features of individuals in the image. Additionally, we improve the original data augmentation strategy in YOLOv8 to better simulate cases where key points of individuals are occluded in the image. Lastly, we introduce a novel regression loss metric, Vertex Intersection over Union, to further enhance the effectiveness of the model in multi-person pose estimation. Our approach attains competitive results on multiple metrics of two open-source datasets, MS COCO 2017 and CrowdPose. Compared with the baseline model YOLOv8x-pose, HDA-Pose improves the average precision by 2.9% and 3.3% on the two datasets, respectively.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-024-03274-2</doi><tpages>17</tpages></addata></record>
fulltext fulltext
identifier ISSN: 1863-1703
ispartof Signal, image and video processing, 2024-09, Vol.18 (8-9), p.5823-5839
issn 1863-1703
1863-1711
language eng
recordid cdi_proquest_journals_3086029658
source SpringerLink Journals - AutoHoldings
subjects Computer Imaging
Computer Science
Data augmentation
Datasets
Image enhancement
Image Processing and Computer Vision
Multimedia Information Systems
Occlusion
Original Paper
Pattern Recognition and Graphics
Pose estimation
Real time
Regression models
Signal,Image and Speech Processing
Two dimensional bodies
Vision
title HDA-pose: a real-time 2D human pose estimation method based on modified YOLOv8
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T19%3A45%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=HDA-pose:%20a%20real-time%202D%20human%20pose%20estimation%20method%20based%20on%20modified%20YOLOv8&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Dong,%20Chengang&rft.date=2024-09-01&rft.volume=18&rft.issue=8-9&rft.spage=5823&rft.epage=5839&rft.pages=5823-5839&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-024-03274-2&rft_dat=%3Cproquest_cross%3E3086029658%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3086029658&rft_id=info:pmid/&rfr_iscdi=true