RoHM: Robust Human Motion Reconstruction via Diffusion

We propose RoHM, an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos in the presence of noise and occlusions. Most previous approaches either train neural networks to directly regress motion in 3D or learn data-driven motion priors and combine them with optimization a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Siwei, Bhatnagar, Bharat Lal, Xu, Yuanlu, Winkler, Alexander, Kadlecek, Petr, Tang, Siyu, Bogo, Federica
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhang, Siwei Bhatnagar, Bharat Lal Xu, Yuanlu Winkler, Alexander Kadlecek, Petr Tang, Siyu Bogo, Federica
description	We propose RoHM, an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos in the presence of noise and occlusions. Most previous approaches either train neural networks to directly regress motion in 3D or learn data-driven motion priors and combine them with optimization at test time. The former do not recover globally coherent motion and fail under occlusions; the latter are time-consuming, prone to local minima, and require manual tuning. To overcome these shortcomings, we exploit the iterative, denoising nature of diffusion models. RoHM is a novel diffusion-based motion model that, conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates. Given the complexity of the problem -- requiring one to address different tasks (denoising and infilling) in different solution spaces (local and global motion) -- we decompose it into two sub-tasks and learn two models, one for global trajectory and one for local motion. To capture the correlations between the two, we then introduce a novel conditioning module, combining it with an iterative inference scheme. We apply RoHM to a variety of tasks -- from motion reconstruction and denoising to spatial and temporal infilling. Extensive experiments on three popular datasets show that our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time. The code is available at https://sanweiliti.github.io/ROHM/ROHM.html.
doi_str_mv	10.48550/arxiv.2401.08570
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2401_08570</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2401_08570</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-96ca4125e8bc29bc9f90cfb7ff12b172a6da3a430580ed528505e8b197c3d313</originalsourceid><addsrcrecordid>eNotjl1rwjAYRnPjxdD9gF2ZP9DuTdI0iXfiph0oQvW-vEkbCGgz-iHz32_WXT0ceDgcQt4YpJmWEt6x-wm3lGfAUtBSwQvJy1gcVrSMduwHWoxXbOkhDiG2tGxcbPuhG92Et4D0I3g_9n-0IDOPl755_d85OW0_z5si2R93X5v1PsFcQWJyhxnjstHWcWOd8Qact8p7xi1THPMaBWYCpIamllxLeHyZUU7Ugok5WT6tU3f13YUrdvfq0V9N_eIXh-s_fQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>RoHM: Robust Human Motion Reconstruction via Diffusion</title><source>arXiv.org</source><creator>Zhang, Siwei ; Bhatnagar, Bharat Lal ; Xu, Yuanlu ; Winkler, Alexander ; Kadlecek, Petr ; Tang, Siyu ; Bogo, Federica</creator><creatorcontrib>Zhang, Siwei ; Bhatnagar, Bharat Lal ; Xu, Yuanlu ; Winkler, Alexander ; Kadlecek, Petr ; Tang, Siyu ; Bogo, Federica</creatorcontrib><description>We propose RoHM, an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos in the presence of noise and occlusions. Most previous approaches either train neural networks to directly regress motion in 3D or learn data-driven motion priors and combine them with optimization at test time. The former do not recover globally coherent motion and fail under occlusions; the latter are time-consuming, prone to local minima, and require manual tuning. To overcome these shortcomings, we exploit the iterative, denoising nature of diffusion models. RoHM is a novel diffusion-based motion model that, conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates. Given the complexity of the problem -- requiring one to address different tasks (denoising and infilling) in different solution spaces (local and global motion) -- we decompose it into two sub-tasks and learn two models, one for global trajectory and one for local motion. To capture the correlations between the two, we then introduce a novel conditioning module, combining it with an iterative inference scheme. We apply RoHM to a variety of tasks -- from motion reconstruction and denoising to spatial and temporal infilling. Extensive experiments on three popular datasets show that our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time. The code is available at https://sanweiliti.github.io/ROHM/ROHM.html.</description><identifier>DOI: 10.48550/arxiv.2401.08570</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2401.08570$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2401.08570$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Siwei</creatorcontrib><creatorcontrib>Bhatnagar, Bharat Lal</creatorcontrib><creatorcontrib>Xu, Yuanlu</creatorcontrib><creatorcontrib>Winkler, Alexander</creatorcontrib><creatorcontrib>Kadlecek, Petr</creatorcontrib><creatorcontrib>Tang, Siyu</creatorcontrib><creatorcontrib>Bogo, Federica</creatorcontrib><title>RoHM: Robust Human Motion Reconstruction via Diffusion</title><description>We propose RoHM, an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos in the presence of noise and occlusions. Most previous approaches either train neural networks to directly regress motion in 3D or learn data-driven motion priors and combine them with optimization at test time. The former do not recover globally coherent motion and fail under occlusions; the latter are time-consuming, prone to local minima, and require manual tuning. To overcome these shortcomings, we exploit the iterative, denoising nature of diffusion models. RoHM is a novel diffusion-based motion model that, conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates. Given the complexity of the problem -- requiring one to address different tasks (denoising and infilling) in different solution spaces (local and global motion) -- we decompose it into two sub-tasks and learn two models, one for global trajectory and one for local motion. To capture the correlations between the two, we then introduce a novel conditioning module, combining it with an iterative inference scheme. We apply RoHM to a variety of tasks -- from motion reconstruction and denoising to spatial and temporal infilling. Extensive experiments on three popular datasets show that our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time. The code is available at https://sanweiliti.github.io/ROHM/ROHM.html.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotjl1rwjAYRnPjxdD9gF2ZP9DuTdI0iXfiph0oQvW-vEkbCGgz-iHz32_WXT0ceDgcQt4YpJmWEt6x-wm3lGfAUtBSwQvJy1gcVrSMduwHWoxXbOkhDiG2tGxcbPuhG92Et4D0I3g_9n-0IDOPl755_d85OW0_z5si2R93X5v1PsFcQWJyhxnjstHWcWOd8Qact8p7xi1THPMaBWYCpIamllxLeHyZUU7Ugok5WT6tU3f13YUrdvfq0V9N_eIXh-s_fQ</recordid><startdate>20240116</startdate><enddate>20240116</enddate><creator>Zhang, Siwei</creator><creator>Bhatnagar, Bharat Lal</creator><creator>Xu, Yuanlu</creator><creator>Winkler, Alexander</creator><creator>Kadlecek, Petr</creator><creator>Tang, Siyu</creator><creator>Bogo, Federica</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240116</creationdate><title>RoHM: Robust Human Motion Reconstruction via Diffusion</title><author>Zhang, Siwei ; Bhatnagar, Bharat Lal ; Xu, Yuanlu ; Winkler, Alexander ; Kadlecek, Petr ; Tang, Siyu ; Bogo, Federica</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-96ca4125e8bc29bc9f90cfb7ff12b172a6da3a430580ed528505e8b197c3d313</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Siwei</creatorcontrib><creatorcontrib>Bhatnagar, Bharat Lal</creatorcontrib><creatorcontrib>Xu, Yuanlu</creatorcontrib><creatorcontrib>Winkler, Alexander</creatorcontrib><creatorcontrib>Kadlecek, Petr</creatorcontrib><creatorcontrib>Tang, Siyu</creatorcontrib><creatorcontrib>Bogo, Federica</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Siwei</au><au>Bhatnagar, Bharat Lal</au><au>Xu, Yuanlu</au><au>Winkler, Alexander</au><au>Kadlecek, Petr</au><au>Tang, Siyu</au><au>Bogo, Federica</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RoHM: Robust Human Motion Reconstruction via Diffusion</atitle><date>2024-01-16</date><risdate>2024</risdate><abstract>We propose RoHM, an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos in the presence of noise and occlusions. Most previous approaches either train neural networks to directly regress motion in 3D or learn data-driven motion priors and combine them with optimization at test time. The former do not recover globally coherent motion and fail under occlusions; the latter are time-consuming, prone to local minima, and require manual tuning. To overcome these shortcomings, we exploit the iterative, denoising nature of diffusion models. RoHM is a novel diffusion-based motion model that, conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates. Given the complexity of the problem -- requiring one to address different tasks (denoising and infilling) in different solution spaces (local and global motion) -- we decompose it into two sub-tasks and learn two models, one for global trajectory and one for local motion. To capture the correlations between the two, we then introduce a novel conditioning module, combining it with an iterative inference scheme. We apply RoHM to a variety of tasks -- from motion reconstruction and denoising to spatial and temporal infilling. Extensive experiments on three popular datasets show that our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time. The code is available at https://sanweiliti.github.io/ROHM/ROHM.html.</abstract><doi>10.48550/arxiv.2401.08570</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2401.08570
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2401_08570
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	RoHM: Robust Human Motion Reconstruction via Diffusion
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T20%3A24%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RoHM:%20Robust%20Human%20Motion%20Reconstruction%20via%20Diffusion&rft.au=Zhang,%20Siwei&rft.date=2024-01-16&rft_id=info:doi/10.48550/arxiv.2401.08570&rft_dat=%3Carxiv_GOX%3E2401_08570%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true