HDA-pose: a real-time 2D human pose estimation method based on modified YOLOv8
2D human pose estimation aims to accurately regress the keypoints of human body from images or videos. However, it remains challenging due to the occlusion and intersection among multiple individuals and the difficulty of dealing with different body scales. In order to better tackle these issues, we...
Gespeichert in:
Veröffentlicht in: | Signal, image and video processing image and video processing, 2024-09, Vol.18 (8-9), p.5823-5839 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 5839 |
---|---|
container_issue | 8-9 |
container_start_page | 5823 |
container_title | Signal, image and video processing |
container_volume | 18 |
creator | Dong, Chengang Tang, Yuhao Zhang, Liyan |
description | 2D human pose estimation aims to accurately regress the keypoints of human body from images or videos. However, it remains challenging due to the occlusion and intersection among multiple individuals and the difficulty of dealing with different body scales. In order to better tackle these issues, we propose a human pose estimation framework named HDA-Pose. By improving the real-time framework of YOLOv8, we achieve simultaneous regression of all individuals' keypoint locations in the image. Specifically, we propose the High-Grade Dual Attention (HDA) module to further enhance the focus of YOLOv8 on important features of individuals in the image. Additionally, we improve the original data augmentation strategy in YOLOv8 to better simulate cases where key points of individuals are occluded in the image. Lastly, we introduce a novel regression loss metric, Vertex Intersection over Union, to further enhance the effectiveness of the model in multi-person pose estimation. Our approach attains competitive results on multiple metrics of two open-source datasets, MS COCO 2017 and CrowdPose. Compared with the baseline model YOLOv8x-pose, HDA-Pose improves the average precision by 2.9% and 3.3% on the two datasets, respectively. |
doi_str_mv | 10.1007/s11760-024-03274-2 |
format | Article |
fullrecord | <record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3086029658</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3086029658</sourcerecordid><originalsourceid>FETCH-LOGICAL-c200t-88de191467af8185ab01745f3c75074711611bb1c3f3695dade012a19deb697f3</originalsourceid><addsrcrecordid>eNp9UD1PwzAQtRBIVKV_gMkSs-HOTmyHrWqBIlV0gYHJcmKHpmqTYqdI_HtcgmDjlvt67z4eIZcI1wigbiKiksCAZwwEVxnjJ2SEWgqGCvH0NwZxTiYxbiBZwmmpR-RpMZ-yfRf9LbU0eLtlfbPzlM_p-rCzLT22qI-paPuma-nO9-vO0dJG7-gx71xTNyl-XS1XH_qCnNV2G_3kx4_Jy_3d82zBlquHx9l0ySoO0DOtnccCM6lsrVHntgRUWV6LSuWgsnS1RCxLrEQtZJE76zwgt1g4X8pC1WJMroa5-9C9H9J9ZtMdQptWGgFaAi9krhOKD6gqdDEGX5t9SI-ET4NgjtKZQTqTpDPf0hmeSGIgxQRu33z4G_0P6wsVgW4f</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3086029658</pqid></control><display><type>article</type><title>HDA-pose: a real-time 2D human pose estimation method based on modified YOLOv8</title><source>SpringerLink Journals - AutoHoldings</source><creator>Dong, Chengang ; Tang, Yuhao ; Zhang, Liyan</creator><creatorcontrib>Dong, Chengang ; Tang, Yuhao ; Zhang, Liyan</creatorcontrib><description>2D human pose estimation aims to accurately regress the keypoints of human body from images or videos. However, it remains challenging due to the occlusion and intersection among multiple individuals and the difficulty of dealing with different body scales. In order to better tackle these issues, we propose a human pose estimation framework named HDA-Pose. By improving the real-time framework of YOLOv8, we achieve simultaneous regression of all individuals' keypoint locations in the image. Specifically, we propose the High-Grade Dual Attention (HDA) module to further enhance the focus of YOLOv8 on important features of individuals in the image. Additionally, we improve the original data augmentation strategy in YOLOv8 to better simulate cases where key points of individuals are occluded in the image. Lastly, we introduce a novel regression loss metric, Vertex Intersection over Union, to further enhance the effectiveness of the model in multi-person pose estimation. Our approach attains competitive results on multiple metrics of two open-source datasets, MS COCO 2017 and CrowdPose. Compared with the baseline model YOLOv8x-pose, HDA-Pose improves the average precision by 2.9% and 3.3% on the two datasets, respectively.</description><identifier>ISSN: 1863-1703</identifier><identifier>EISSN: 1863-1711</identifier><identifier>DOI: 10.1007/s11760-024-03274-2</identifier><language>eng</language><publisher>London: Springer London</publisher><subject>Computer Imaging ; Computer Science ; Data augmentation ; Datasets ; Image enhancement ; Image Processing and Computer Vision ; Multimedia Information Systems ; Occlusion ; Original Paper ; Pattern Recognition and Graphics ; Pose estimation ; Real time ; Regression models ; Signal,Image and Speech Processing ; Two dimensional bodies ; Vision</subject><ispartof>Signal, image and video processing, 2024-09, Vol.18 (8-9), p.5823-5839</ispartof><rights>The Author(s), under exclusive licence to Springer-Verlag London Ltd., part of Springer Nature 2024. Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c200t-88de191467af8185ab01745f3c75074711611bb1c3f3695dade012a19deb697f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s11760-024-03274-2$$EPDF$$P50$$Gspringer$$H</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s11760-024-03274-2$$EHTML$$P50$$Gspringer$$H</linktohtml><link.rule.ids>314,780,784,27924,27925,41488,42557,51319</link.rule.ids></links><search><creatorcontrib>Dong, Chengang</creatorcontrib><creatorcontrib>Tang, Yuhao</creatorcontrib><creatorcontrib>Zhang, Liyan</creatorcontrib><title>HDA-pose: a real-time 2D human pose estimation method based on modified YOLOv8</title><title>Signal, image and video processing</title><addtitle>SIViP</addtitle><description>2D human pose estimation aims to accurately regress the keypoints of human body from images or videos. However, it remains challenging due to the occlusion and intersection among multiple individuals and the difficulty of dealing with different body scales. In order to better tackle these issues, we propose a human pose estimation framework named HDA-Pose. By improving the real-time framework of YOLOv8, we achieve simultaneous regression of all individuals' keypoint locations in the image. Specifically, we propose the High-Grade Dual Attention (HDA) module to further enhance the focus of YOLOv8 on important features of individuals in the image. Additionally, we improve the original data augmentation strategy in YOLOv8 to better simulate cases where key points of individuals are occluded in the image. Lastly, we introduce a novel regression loss metric, Vertex Intersection over Union, to further enhance the effectiveness of the model in multi-person pose estimation. Our approach attains competitive results on multiple metrics of two open-source datasets, MS COCO 2017 and CrowdPose. Compared with the baseline model YOLOv8x-pose, HDA-Pose improves the average precision by 2.9% and 3.3% on the two datasets, respectively.</description><subject>Computer Imaging</subject><subject>Computer Science</subject><subject>Data augmentation</subject><subject>Datasets</subject><subject>Image enhancement</subject><subject>Image Processing and Computer Vision</subject><subject>Multimedia Information Systems</subject><subject>Occlusion</subject><subject>Original Paper</subject><subject>Pattern Recognition and Graphics</subject><subject>Pose estimation</subject><subject>Real time</subject><subject>Regression models</subject><subject>Signal,Image and Speech Processing</subject><subject>Two dimensional bodies</subject><subject>Vision</subject><issn>1863-1703</issn><issn>1863-1711</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNp9UD1PwzAQtRBIVKV_gMkSs-HOTmyHrWqBIlV0gYHJcmKHpmqTYqdI_HtcgmDjlvt67z4eIZcI1wigbiKiksCAZwwEVxnjJ2SEWgqGCvH0NwZxTiYxbiBZwmmpR-RpMZ-yfRf9LbU0eLtlfbPzlM_p-rCzLT22qI-paPuma-nO9-vO0dJG7-gx71xTNyl-XS1XH_qCnNV2G_3kx4_Jy_3d82zBlquHx9l0ySoO0DOtnccCM6lsrVHntgRUWV6LSuWgsnS1RCxLrEQtZJE76zwgt1g4X8pC1WJMroa5-9C9H9J9ZtMdQptWGgFaAi9krhOKD6gqdDEGX5t9SI-ET4NgjtKZQTqTpDPf0hmeSGIgxQRu33z4G_0P6wsVgW4f</recordid><startdate>20240901</startdate><enddate>20240901</enddate><creator>Dong, Chengang</creator><creator>Tang, Yuhao</creator><creator>Zhang, Liyan</creator><general>Springer London</general><general>Springer Nature B.V</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20240901</creationdate><title>HDA-pose: a real-time 2D human pose estimation method based on modified YOLOv8</title><author>Dong, Chengang ; Tang, Yuhao ; Zhang, Liyan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c200t-88de191467af8185ab01745f3c75074711611bb1c3f3695dade012a19deb697f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Imaging</topic><topic>Computer Science</topic><topic>Data augmentation</topic><topic>Datasets</topic><topic>Image enhancement</topic><topic>Image Processing and Computer Vision</topic><topic>Multimedia Information Systems</topic><topic>Occlusion</topic><topic>Original Paper</topic><topic>Pattern Recognition and Graphics</topic><topic>Pose estimation</topic><topic>Real time</topic><topic>Regression models</topic><topic>Signal,Image and Speech Processing</topic><topic>Two dimensional bodies</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Dong, Chengang</creatorcontrib><creatorcontrib>Tang, Yuhao</creatorcontrib><creatorcontrib>Zhang, Liyan</creatorcontrib><collection>CrossRef</collection><jtitle>Signal, image and video processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Dong, Chengang</au><au>Tang, Yuhao</au><au>Zhang, Liyan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>HDA-pose: a real-time 2D human pose estimation method based on modified YOLOv8</atitle><jtitle>Signal, image and video processing</jtitle><stitle>SIViP</stitle><date>2024-09-01</date><risdate>2024</risdate><volume>18</volume><issue>8-9</issue><spage>5823</spage><epage>5839</epage><pages>5823-5839</pages><issn>1863-1703</issn><eissn>1863-1711</eissn><abstract>2D human pose estimation aims to accurately regress the keypoints of human body from images or videos. However, it remains challenging due to the occlusion and intersection among multiple individuals and the difficulty of dealing with different body scales. In order to better tackle these issues, we propose a human pose estimation framework named HDA-Pose. By improving the real-time framework of YOLOv8, we achieve simultaneous regression of all individuals' keypoint locations in the image. Specifically, we propose the High-Grade Dual Attention (HDA) module to further enhance the focus of YOLOv8 on important features of individuals in the image. Additionally, we improve the original data augmentation strategy in YOLOv8 to better simulate cases where key points of individuals are occluded in the image. Lastly, we introduce a novel regression loss metric, Vertex Intersection over Union, to further enhance the effectiveness of the model in multi-person pose estimation. Our approach attains competitive results on multiple metrics of two open-source datasets, MS COCO 2017 and CrowdPose. Compared with the baseline model YOLOv8x-pose, HDA-Pose improves the average precision by 2.9% and 3.3% on the two datasets, respectively.</abstract><cop>London</cop><pub>Springer London</pub><doi>10.1007/s11760-024-03274-2</doi><tpages>17</tpages></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1863-1703 |
ispartof | Signal, image and video processing, 2024-09, Vol.18 (8-9), p.5823-5839 |
issn | 1863-1703 1863-1711 |
language | eng |
recordid | cdi_proquest_journals_3086029658 |
source | SpringerLink Journals - AutoHoldings |
subjects | Computer Imaging Computer Science Data augmentation Datasets Image enhancement Image Processing and Computer Vision Multimedia Information Systems Occlusion Original Paper Pattern Recognition and Graphics Pose estimation Real time Regression models Signal,Image and Speech Processing Two dimensional bodies Vision |
title | HDA-pose: a real-time 2D human pose estimation method based on modified YOLOv8 |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-07T19%3A45%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=HDA-pose:%20a%20real-time%202D%20human%20pose%20estimation%20method%20based%20on%20modified%20YOLOv8&rft.jtitle=Signal,%20image%20and%20video%20processing&rft.au=Dong,%20Chengang&rft.date=2024-09-01&rft.volume=18&rft.issue=8-9&rft.spage=5823&rft.epage=5839&rft.pages=5823-5839&rft.issn=1863-1703&rft.eissn=1863-1711&rft_id=info:doi/10.1007/s11760-024-03274-2&rft_dat=%3Cproquest_cross%3E3086029658%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3086029658&rft_id=info:pmid/&rfr_iscdi=true |