UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning

In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as levera...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Jiang, Zhongyu, Chai, Wenhao, Li, Lei, Zhou, Zhuoran, Yang, Cheng-Yen, Hwang, Jenq-Neng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Jiang, Zhongyu
Chai, Wenhao
Li, Lei
Zhou, Zhuoran
Yang, Cheng-Yen
Hwang, Jenq-Neng
description In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are two critical perceptual tasks in computer vision, which have numerous downstream applications, such as Action Recognition, Human-Computer Interaction, Object tracking, etc. Yet, there are limited instances where the correlation between Image and 2D/3D human pose has been clearly researched using a contrastive paradigm. In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i.e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline. To align more than two modalities at the same time, we propose a novel singular value based contrastive learning loss, which better aligns different modalities and further boosts the performance. In our evaluation, UniHPE achieves remarkable performance metrics: MPJPE $50.5$mm on the Human3.6M dataset and PAMPJPE $51.6$mm on the 3DPW dataset. Our proposed method holds immense potential to advance the field of computer vision and contribute to various applications.
doi_str_mv 10.48550/arxiv.2311.16477
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2311_16477</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2311_16477</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-842cd947c9204f5fb6a8d8e1bcad3fb84cbc156d98547b4ee67c711f9ad367c53</originalsourceid><addsrcrecordid>eNotj7tOwzAUhr0woNIHYMIvkBAnvqVbFaUEKRIdwhwd35Cl1qmckMLbYwrTf5OOzofQIylyKhkrniF--TUvK0JywqkQ9-jwHnx3bHd4mK4QzYxTdt4a3H2eIeDjNFvczos_w-KngFcPuJnCEiF1q8W9hRh8-HhAdw5Os93-6wYNh3Zouqx_e3lt9n0GXIhM0lKbmgpdlwV1zCkO0khLlAZTOSWpVpowbmrJqFDUWi60IMTVaU6WVRv09Hf2BjJeYvorfo-_QOMNqPoBulZGFg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning</title><source>arXiv.org</source><creator>Jiang, Zhongyu ; Chai, Wenhao ; Li, Lei ; Zhou, Zhuoran ; Yang, Cheng-Yen ; Hwang, Jenq-Neng</creator><creatorcontrib>Jiang, Zhongyu ; Chai, Wenhao ; Li, Lei ; Zhou, Zhuoran ; Yang, Cheng-Yen ; Hwang, Jenq-Neng</creatorcontrib><description>In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are two critical perceptual tasks in computer vision, which have numerous downstream applications, such as Action Recognition, Human-Computer Interaction, Object tracking, etc. Yet, there are limited instances where the correlation between Image and 2D/3D human pose has been clearly researched using a contrastive paradigm. In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i.e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline. To align more than two modalities at the same time, we propose a novel singular value based contrastive learning loss, which better aligns different modalities and further boosts the performance. In our evaluation, UniHPE achieves remarkable performance metrics: MPJPE $50.5$mm on the Human3.6M dataset and PAMPJPE $51.6$mm on the 3DPW dataset. Our proposed method holds immense potential to advance the field of computer vision and contribute to various applications.</description><identifier>DOI: 10.48550/arxiv.2311.16477</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2311.16477$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2311.16477$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Zhongyu</creatorcontrib><creatorcontrib>Chai, Wenhao</creatorcontrib><creatorcontrib>Li, Lei</creatorcontrib><creatorcontrib>Zhou, Zhuoran</creatorcontrib><creatorcontrib>Yang, Cheng-Yen</creatorcontrib><creatorcontrib>Hwang, Jenq-Neng</creatorcontrib><title>UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning</title><description>In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are two critical perceptual tasks in computer vision, which have numerous downstream applications, such as Action Recognition, Human-Computer Interaction, Object tracking, etc. Yet, there are limited instances where the correlation between Image and 2D/3D human pose has been clearly researched using a contrastive paradigm. In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i.e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline. To align more than two modalities at the same time, we propose a novel singular value based contrastive learning loss, which better aligns different modalities and further boosts the performance. In our evaluation, UniHPE achieves remarkable performance metrics: MPJPE $50.5$mm on the Human3.6M dataset and PAMPJPE $51.6$mm on the 3DPW dataset. Our proposed method holds immense potential to advance the field of computer vision and contribute to various applications.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj7tOwzAUhr0woNIHYMIvkBAnvqVbFaUEKRIdwhwd35Cl1qmckMLbYwrTf5OOzofQIylyKhkrniF--TUvK0JywqkQ9-jwHnx3bHd4mK4QzYxTdt4a3H2eIeDjNFvczos_w-KngFcPuJnCEiF1q8W9hRh8-HhAdw5Os93-6wYNh3Zouqx_e3lt9n0GXIhM0lKbmgpdlwV1zCkO0khLlAZTOSWpVpowbmrJqFDUWi60IMTVaU6WVRv09Hf2BjJeYvorfo-_QOMNqPoBulZGFg</recordid><startdate>20231124</startdate><enddate>20231124</enddate><creator>Jiang, Zhongyu</creator><creator>Chai, Wenhao</creator><creator>Li, Lei</creator><creator>Zhou, Zhuoran</creator><creator>Yang, Cheng-Yen</creator><creator>Hwang, Jenq-Neng</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231124</creationdate><title>UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning</title><author>Jiang, Zhongyu ; Chai, Wenhao ; Li, Lei ; Zhou, Zhuoran ; Yang, Cheng-Yen ; Hwang, Jenq-Neng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-842cd947c9204f5fb6a8d8e1bcad3fb84cbc156d98547b4ee67c711f9ad367c53</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Zhongyu</creatorcontrib><creatorcontrib>Chai, Wenhao</creatorcontrib><creatorcontrib>Li, Lei</creatorcontrib><creatorcontrib>Zhou, Zhuoran</creatorcontrib><creatorcontrib>Yang, Cheng-Yen</creatorcontrib><creatorcontrib>Hwang, Jenq-Neng</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Zhongyu</au><au>Chai, Wenhao</au><au>Li, Lei</au><au>Zhou, Zhuoran</au><au>Yang, Cheng-Yen</au><au>Hwang, Jenq-Neng</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning</atitle><date>2023-11-24</date><risdate>2023</risdate><abstract>In recent times, there has been a growing interest in developing effective perception techniques for combining information from multiple modalities. This involves aligning features obtained from diverse sources to enable more efficient training with larger datasets and constraints, as well as leveraging the wealth of information contained in each modality. 2D and 3D Human Pose Estimation (HPE) are two critical perceptual tasks in computer vision, which have numerous downstream applications, such as Action Recognition, Human-Computer Interaction, Object tracking, etc. Yet, there are limited instances where the correlation between Image and 2D/3D human pose has been clearly researched using a contrastive paradigm. In this paper, we propose UniHPE, a unified Human Pose Estimation pipeline, which aligns features from all three modalities, i.e., 2D human pose estimation, lifting-based and image-based 3D human pose estimation, in the same pipeline. To align more than two modalities at the same time, we propose a novel singular value based contrastive learning loss, which better aligns different modalities and further boosts the performance. In our evaluation, UniHPE achieves remarkable performance metrics: MPJPE $50.5$mm on the Human3.6M dataset and PAMPJPE $51.6$mm on the 3DPW dataset. Our proposed method holds immense potential to advance the field of computer vision and contribute to various applications.</abstract><doi>10.48550/arxiv.2311.16477</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2311.16477
ispartof
issn
language eng
recordid cdi_arxiv_primary_2311_16477
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title UniHPE: Towards Unified Human Pose Estimation via Contrastive Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-08T07%3A00%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=UniHPE:%20Towards%20Unified%20Human%20Pose%20Estimation%20via%20Contrastive%20Learning&rft.au=Jiang,%20Zhongyu&rft.date=2023-11-24&rft_id=info:doi/10.48550/arxiv.2311.16477&rft_dat=%3Carxiv_GOX%3E2311_16477%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true