STRIDE: Single-video based Temporally Continuous Occlusion-Robust 3D Pose Estimation
The capability to accurately estimate 3D human poses is crucial for diverse fields such as action recognition, gait recognition, and virtual/augmented reality. However, a persistent and significant challenge within this field is the accurate prediction of human poses under conditions of severe occlu...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Lal, Rohit Bachu, Saketh Garg, Yash Dutta, Arindam Ta, Calvin-Khang Raychaudhuri, Dripta S Cruz, Hannah Dela Asif, M. Salman Roy-Chowdhury, Amit K |
description | The capability to accurately estimate 3D human poses is crucial for diverse
fields such as action recognition, gait recognition, and virtual/augmented
reality. However, a persistent and significant challenge within this field is
the accurate prediction of human poses under conditions of severe occlusion.
Traditional image-based estimators struggle with heavy occlusions due to a lack
of temporal context, resulting in inconsistent predictions. While video-based
models benefit from processing temporal data, they encounter limitations when
faced with prolonged occlusions that extend over multiple frames. This
challenge arises because these models struggle to generalize beyond their
training datasets, and the variety of occlusions is hard to capture in the
training data. Addressing these challenges, we propose STRIDE (Single-video
based TempoRally contInuous Occlusion-Robust 3D Pose Estimation), a novel
Test-Time Training (TTT) approach to fit a human motion prior for each video.
This approach specifically handles occlusions that were not encountered during
the model's training. By employing STRIDE, we can refine a sequence of noisy
initial pose estimates into accurate, temporally coherent poses during test
time, effectively overcoming the limitations of prior methods. Our framework
demonstrates flexibility by being model-agnostic, allowing us to use any
off-the-shelf 3D pose estimation method for improving robustness and temporal
consistency. We validate STRIDE's efficacy through comprehensive experiments on
challenging datasets like Occluded Human3.6M, Human3.6M, and OCMotion, where it
not only outperforms existing single-image and video-based pose estimation
models but also showcases superior handling of substantial occlusions,
achieving fast, robust, accurate, and temporally consistent 3D pose estimates.
Code is made publicly available at https://github.com/take2rohit/stride |
doi_str_mv | 10.48550/arxiv.2312.16221 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2312_16221</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2312_16221</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2312_162213</originalsourceid><addsrcrecordid>eNqFjr0OgjAYRbs4GPUBnPxeAKRFjHEFjE4a6E4KVNOktKQ_RN5eJO5Od7gnOQehLY7CwylJoj0zbzGEJMYkxEdC8BLRkha3LD9DKdRL8mAQLddQM8tboLzrtWFSjpBq5YTy2lu4N430VmgVFLr21kGcwUNbDrl1omNuetZo8WTS8s1vV2h3yWl6DWZ_1ZuJM2P17ajmjvg_8QGbkj1U</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>STRIDE: Single-video based Temporally Continuous Occlusion-Robust 3D Pose Estimation</title><source>arXiv.org</source><creator>Lal, Rohit ; Bachu, Saketh ; Garg, Yash ; Dutta, Arindam ; Ta, Calvin-Khang ; Raychaudhuri, Dripta S ; Cruz, Hannah Dela ; Asif, M. Salman ; Roy-Chowdhury, Amit K</creator><creatorcontrib>Lal, Rohit ; Bachu, Saketh ; Garg, Yash ; Dutta, Arindam ; Ta, Calvin-Khang ; Raychaudhuri, Dripta S ; Cruz, Hannah Dela ; Asif, M. Salman ; Roy-Chowdhury, Amit K</creatorcontrib><description>The capability to accurately estimate 3D human poses is crucial for diverse
fields such as action recognition, gait recognition, and virtual/augmented
reality. However, a persistent and significant challenge within this field is
the accurate prediction of human poses under conditions of severe occlusion.
Traditional image-based estimators struggle with heavy occlusions due to a lack
of temporal context, resulting in inconsistent predictions. While video-based
models benefit from processing temporal data, they encounter limitations when
faced with prolonged occlusions that extend over multiple frames. This
challenge arises because these models struggle to generalize beyond their
training datasets, and the variety of occlusions is hard to capture in the
training data. Addressing these challenges, we propose STRIDE (Single-video
based TempoRally contInuous Occlusion-Robust 3D Pose Estimation), a novel
Test-Time Training (TTT) approach to fit a human motion prior for each video.
This approach specifically handles occlusions that were not encountered during
the model's training. By employing STRIDE, we can refine a sequence of noisy
initial pose estimates into accurate, temporally coherent poses during test
time, effectively overcoming the limitations of prior methods. Our framework
demonstrates flexibility by being model-agnostic, allowing us to use any
off-the-shelf 3D pose estimation method for improving robustness and temporal
consistency. We validate STRIDE's efficacy through comprehensive experiments on
challenging datasets like Occluded Human3.6M, Human3.6M, and OCMotion, where it
not only outperforms existing single-image and video-based pose estimation
models but also showcases superior handling of substantial occlusions,
achieving fast, robust, accurate, and temporally consistent 3D pose estimates.
Code is made publicly available at https://github.com/take2rohit/stride</description><identifier>DOI: 10.48550/arxiv.2312.16221</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2312.16221$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2312.16221$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lal, Rohit</creatorcontrib><creatorcontrib>Bachu, Saketh</creatorcontrib><creatorcontrib>Garg, Yash</creatorcontrib><creatorcontrib>Dutta, Arindam</creatorcontrib><creatorcontrib>Ta, Calvin-Khang</creatorcontrib><creatorcontrib>Raychaudhuri, Dripta S</creatorcontrib><creatorcontrib>Cruz, Hannah Dela</creatorcontrib><creatorcontrib>Asif, M. Salman</creatorcontrib><creatorcontrib>Roy-Chowdhury, Amit K</creatorcontrib><title>STRIDE: Single-video based Temporally Continuous Occlusion-Robust 3D Pose Estimation</title><description>The capability to accurately estimate 3D human poses is crucial for diverse
fields such as action recognition, gait recognition, and virtual/augmented
reality. However, a persistent and significant challenge within this field is
the accurate prediction of human poses under conditions of severe occlusion.
Traditional image-based estimators struggle with heavy occlusions due to a lack
of temporal context, resulting in inconsistent predictions. While video-based
models benefit from processing temporal data, they encounter limitations when
faced with prolonged occlusions that extend over multiple frames. This
challenge arises because these models struggle to generalize beyond their
training datasets, and the variety of occlusions is hard to capture in the
training data. Addressing these challenges, we propose STRIDE (Single-video
based TempoRally contInuous Occlusion-Robust 3D Pose Estimation), a novel
Test-Time Training (TTT) approach to fit a human motion prior for each video.
This approach specifically handles occlusions that were not encountered during
the model's training. By employing STRIDE, we can refine a sequence of noisy
initial pose estimates into accurate, temporally coherent poses during test
time, effectively overcoming the limitations of prior methods. Our framework
demonstrates flexibility by being model-agnostic, allowing us to use any
off-the-shelf 3D pose estimation method for improving robustness and temporal
consistency. We validate STRIDE's efficacy through comprehensive experiments on
challenging datasets like Occluded Human3.6M, Human3.6M, and OCMotion, where it
not only outperforms existing single-image and video-based pose estimation
models but also showcases superior handling of substantial occlusions,
achieving fast, robust, accurate, and temporally consistent 3D pose estimates.
Code is made publicly available at https://github.com/take2rohit/stride</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjr0OgjAYRbs4GPUBnPxeAKRFjHEFjE4a6E4KVNOktKQ_RN5eJO5Od7gnOQehLY7CwylJoj0zbzGEJMYkxEdC8BLRkha3LD9DKdRL8mAQLddQM8tboLzrtWFSjpBq5YTy2lu4N430VmgVFLr21kGcwUNbDrl1omNuetZo8WTS8s1vV2h3yWl6DWZ_1ZuJM2P17ajmjvg_8QGbkj1U</recordid><startdate>20231224</startdate><enddate>20231224</enddate><creator>Lal, Rohit</creator><creator>Bachu, Saketh</creator><creator>Garg, Yash</creator><creator>Dutta, Arindam</creator><creator>Ta, Calvin-Khang</creator><creator>Raychaudhuri, Dripta S</creator><creator>Cruz, Hannah Dela</creator><creator>Asif, M. Salman</creator><creator>Roy-Chowdhury, Amit K</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231224</creationdate><title>STRIDE: Single-video based Temporally Continuous Occlusion-Robust 3D Pose Estimation</title><author>Lal, Rohit ; Bachu, Saketh ; Garg, Yash ; Dutta, Arindam ; Ta, Calvin-Khang ; Raychaudhuri, Dripta S ; Cruz, Hannah Dela ; Asif, M. Salman ; Roy-Chowdhury, Amit K</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2312_162213</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Lal, Rohit</creatorcontrib><creatorcontrib>Bachu, Saketh</creatorcontrib><creatorcontrib>Garg, Yash</creatorcontrib><creatorcontrib>Dutta, Arindam</creatorcontrib><creatorcontrib>Ta, Calvin-Khang</creatorcontrib><creatorcontrib>Raychaudhuri, Dripta S</creatorcontrib><creatorcontrib>Cruz, Hannah Dela</creatorcontrib><creatorcontrib>Asif, M. Salman</creatorcontrib><creatorcontrib>Roy-Chowdhury, Amit K</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lal, Rohit</au><au>Bachu, Saketh</au><au>Garg, Yash</au><au>Dutta, Arindam</au><au>Ta, Calvin-Khang</au><au>Raychaudhuri, Dripta S</au><au>Cruz, Hannah Dela</au><au>Asif, M. Salman</au><au>Roy-Chowdhury, Amit K</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>STRIDE: Single-video based Temporally Continuous Occlusion-Robust 3D Pose Estimation</atitle><date>2023-12-24</date><risdate>2023</risdate><abstract>The capability to accurately estimate 3D human poses is crucial for diverse
fields such as action recognition, gait recognition, and virtual/augmented
reality. However, a persistent and significant challenge within this field is
the accurate prediction of human poses under conditions of severe occlusion.
Traditional image-based estimators struggle with heavy occlusions due to a lack
of temporal context, resulting in inconsistent predictions. While video-based
models benefit from processing temporal data, they encounter limitations when
faced with prolonged occlusions that extend over multiple frames. This
challenge arises because these models struggle to generalize beyond their
training datasets, and the variety of occlusions is hard to capture in the
training data. Addressing these challenges, we propose STRIDE (Single-video
based TempoRally contInuous Occlusion-Robust 3D Pose Estimation), a novel
Test-Time Training (TTT) approach to fit a human motion prior for each video.
This approach specifically handles occlusions that were not encountered during
the model's training. By employing STRIDE, we can refine a sequence of noisy
initial pose estimates into accurate, temporally coherent poses during test
time, effectively overcoming the limitations of prior methods. Our framework
demonstrates flexibility by being model-agnostic, allowing us to use any
off-the-shelf 3D pose estimation method for improving robustness and temporal
consistency. We validate STRIDE's efficacy through comprehensive experiments on
challenging datasets like Occluded Human3.6M, Human3.6M, and OCMotion, where it
not only outperforms existing single-image and video-based pose estimation
models but also showcases superior handling of substantial occlusions,
achieving fast, robust, accurate, and temporally consistent 3D pose estimates.
Code is made publicly available at https://github.com/take2rohit/stride</abstract><doi>10.48550/arxiv.2312.16221</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2312.16221 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2312_16221 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition |
title | STRIDE: Single-video based Temporally Continuous Occlusion-Robust 3D Pose Estimation |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T17%3A48%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=STRIDE:%20Single-video%20based%20Temporally%20Continuous%20Occlusion-Robust%203D%20Pose%20Estimation&rft.au=Lal,%20Rohit&rft.date=2023-12-24&rft_id=info:doi/10.48550/arxiv.2312.16221&rft_dat=%3Carxiv_GOX%3E2312_16221%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |