UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning

In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as levera...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Jiang, Zhongyu, Chai, Wenhao, Li, Lei, Zhou, Zhuoran, Yang, Cheng-Yen, Hwang, Jenq-Neng
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Jiang, Zhongyu Chai, Wenhao Li, Lei Zhou, Zhuoran Yang, Cheng-Yen Hwang, Jenq-Neng
description	In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are two critical perceptual tasks in computer vision, which have numerous downstream applications, such as Action Recognition, Human-Computer Interaction, Object tracking, etc. Yet, there are limited instances where the correlation between Image and 2D/3D human pose has been clearly researched using a contrastive paradigm. In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i.e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline. To align more than two modalities at the same time, we propose a novel singular value based contrastive learning loss, which better aligns different modalities and further boosts the performance. In our evaluation, UniHPE achieves remarkable performance metrics: MPJPE $50.5$mm on the Human3.6M dataset and PAMPJPE $51.6$mm on the 3DPW dataset. Our proposed method holds immense potential to advance the field of computer vision and contribute to various applications.
doi_str_mv	10.48550/arxiv.2311.16477
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2311_16477</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2311_16477</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-842cd947c9204f5fb6a8d8e1bcad3fb84cbc156d98547b4ee67c711f9ad367c53</originalsourceid><addsrcrecordid>eNotj7tOwzAUhr0woNIHYMIvkBAnvqVbFaUEKRIdwhwd35Cl1qmckMLbYwrTf5OOzofQIylyKhkrniF--TUvK0JywqkQ9-jwHnx3bHd4mK4QzYxTdt4a3H2eIeDjNFvczos_w-KngFcPuJnCEiF1q8W9hRh8-HhAdw5Os93-6wYNh3Zouqx_e3lt9n0GXIhM0lKbmgpdlwV1zCkO0khLlAZTOSWpVpowbmrJqFDUWi60IMTVaU6WVRv09Hf2BjJeYvorfo-_QOMNqPoBulZGFg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning</title><source>arXiv.org</source><creator>Jiang, Zhongyu ; Chai, Wenhao ; Li, Lei ; Zhou, Zhuoran ; Yang, Cheng-Yen ; Hwang, Jenq-Neng</creator><creatorcontrib>Jiang, Zhongyu ; Chai, Wenhao ; Li, Lei ; Zhou, Zhuoran ; Yang, Cheng-Yen ; Hwang, Jenq-Neng</creatorcontrib><description>In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are two critical perceptual tasks in computer vision, which have numerous downstream applications, such as Action Recognition, Human-Computer Interaction, Object tracking, etc. Yet, there are limited instances where the correlation between Image and 2D/3D human pose has been clearly researched using a contrastive paradigm. In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i.e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline. To align more than two modalities at the same time, we propose a novel singular value based contrastive learning loss, which better aligns different modalities and further boosts the performance. In our evaluation, UniHPE achieves remarkable performance metrics: MPJPE $50.5$mm on the Human3.6M dataset and PAMPJPE $51.6$mm on the 3DPW dataset. Our proposed method holds immense potential to advance the field of computer vision and contribute to various applications.</description><identifier>DOI: 10.48550/arxiv.2311.16477</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2311.16477$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2311.16477$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Zhongyu</creatorcontrib><creatorcontrib>Chai, Wenhao</creatorcontrib><creatorcontrib>Li, Lei</creatorcontrib><creatorcontrib>Zhou, Zhuoran</creatorcontrib><creatorcontrib>Yang, Cheng-Yen</creatorcontrib><creatorcontrib>Hwang, Jenq-Neng</creatorcontrib><title>UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning</title><description>In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are two critical perceptual tasks in computer vision, which have numerous downstream applications, such as Action Recognition, Human-Computer Interaction, Object tracking, etc. Yet, there are limited instances where the correlation between Image and 2D/3D human pose has been clearly researched using a contrastive paradigm. In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i.e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline. To align more than two modalities at the same time, we propose a novel singular value based contrastive learning loss, which better aligns different modalities and further boosts the performance. In our evaluation, UniHPE achieves remarkable performance metrics: MPJPE $50.5$mm on the Human3.6M dataset and PAMPJPE $51.6$mm on the 3DPW dataset. Our proposed method holds immense potential to advance the field of computer vision and contribute to various applications.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7tOwzAUhr0woNIHYMIvkBAnvqVbFaUEKRIdwhwd35Cl1qmckMLbYwrTf5OOzofQIylyKhkrniF--TUvK0JywqkQ9-jwHnx3bHd4mK4QzYxTdt4a3H2eIeDjNFvczos_w-KngFcPuJnCEiF1q8W9hRh8-HhAdw5Os93-6wYNh3Zouqx_e3lt9n0GXIhM0lKbmgpdlwV1zCkO0khLlAZTOSWpVpowbmrJqFDUWi60IMTVaU6WVRv09Hf2BjJeYvorfo-_QOMNqPoBulZGFg</recordid><startdate>20231124</startdate><enddate>20231124</enddate><creator>Jiang, Zhongyu</creator><creator>Chai, Wenhao</creator><creator>Li, Lei</creator><creator>Zhou, Zhuoran</creator><creator>Yang, Cheng-Yen</creator><creator>Hwang, Jenq-Neng</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231124</creationdate><title>UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning</title><author>Jiang, Zhongyu ; Chai, Wenhao ; Li, Lei ; Zhou, Zhuoran ; Yang, Cheng-Yen ; Hwang, Jenq-Neng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-842cd947c9204f5fb6a8d8e1bcad3fb84cbc156d98547b4ee67c711f9ad367c53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Zhongyu</creatorcontrib><creatorcontrib>Chai, Wenhao</creatorcontrib><creatorcontrib>Li, Lei</creatorcontrib><creatorcontrib>Zhou, Zhuoran</creatorcontrib><creatorcontrib>Yang, Cheng-Yen</creatorcontrib><creatorcontrib>Hwang, Jenq-Neng</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Zhongyu</au><au>Chai, Wenhao</au><au>Li, Lei</au><au>Zhou, Zhuoran</au><au>Yang, Cheng-Yen</au><au>Hwang, Jenq-Neng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning</atitle><date>2023-11-24</date><risdate>2023</risdate><abstract>In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are two critical perceptual tasks in computer vision, which have numerous downstream applications, such as Action Recognition, Human-Computer Interaction, Object tracking, etc. Yet, there are limited instances where the correlation between Image and 2D/3D human pose has been clearly researched using a contrastive paradigm. In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i.e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline. To align more than two modalities at the same time, we propose a novel singular value based contrastive learning loss, which better aligns different modalities and further boosts the performance. In our evaluation, UniHPE achieves remarkable performance metrics: MPJPE $50.5$mm on the Human3.6M dataset and PAMPJPE $51.6$mm on the 3DPW dataset. Our proposed method holds immense potential to advance the field of computer vision and contribute to various applications.</abstract><doi>10.48550/arxiv.2311.16477</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2311.16477
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2311_16477
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T07%3A00%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=UniHPE:%20Towards%20Unified%20Human%20Pose%20Estimation%20via%20Contrastive%20Learning&rft.au=Jiang,%20Zhongyu&rft.date=2023-11-24&rft_id=info:doi/10.48550/arxiv.2311.16477&rft_dat=%3Carxiv_GOX%3E2311_16477%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true