RoHM: Robust Human Motion Reconstruction via Diffusion

We propose RoHM, an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos in the presence of noise and occlusions. Most previous approaches either train neural networks to directly regress motion in 3D or learn data-driven motion priors and combine them with optimization a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhang, Siwei, Bhatnagar, Bharat Lal, Xu, Yuanlu, Winkler, Alexander, Kadlecek, Petr, Tang, Siyu, Bogo, Federica
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zhang, Siwei
Bhatnagar, Bharat Lal
Xu, Yuanlu
Winkler, Alexander
Kadlecek, Petr
Tang, Siyu
Bogo, Federica
description We propose RoHM, an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos in the presence of noise and occlusions. Most previous approaches either train neural networks to directly regress motion in 3D or learn data-driven motion priors and combine them with optimization at test time. The former do not recover globally coherent motion and fail under occlusions; the latter are time-consuming, prone to local minima, and require manual tuning. To overcome these shortcomings, we exploit the iterative, denoising nature of diffusion models. RoHM is a novel diffusion-based motion model that, conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates. Given the complexity of the problem -- requiring one to address different tasks (denoising and infilling) in different solution spaces (local and global motion) -- we decompose it into two sub-tasks and learn two models, one for global trajectory and one for local motion. To capture the correlations between the two, we then introduce a novel conditioning module, combining it with an iterative inference scheme. We apply RoHM to a variety of tasks -- from motion reconstruction and denoising to spatial and temporal infilling. Extensive experiments on three popular datasets show that our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time. The code is available at https://sanweiliti.github.io/ROHM/ROHM.html.
doi_str_mv 10.48550/arxiv.2401.08570
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2401_08570</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2401_08570</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-96ca4125e8bc29bc9f90cfb7ff12b172a6da3a430580ed528505e8b197c3d313</originalsourceid><addsrcrecordid>eNotjl1rwjAYRnPjxdD9gF2ZP9DuTdI0iXfiph0oQvW-vEkbCGgz-iHz32_WXT0ceDgcQt4YpJmWEt6x-wm3lGfAUtBSwQvJy1gcVrSMduwHWoxXbOkhDiG2tGxcbPuhG92Et4D0I3g_9n-0IDOPl755_d85OW0_z5si2R93X5v1PsFcQWJyhxnjstHWcWOd8Qact8p7xi1THPMaBWYCpIamllxLeHyZUU7Ugok5WT6tU3f13YUrdvfq0V9N_eIXh-s_fQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>RoHM: Robust Human Motion Reconstruction via Diffusion</title><source>arXiv.org</source><creator>Zhang, Siwei ; Bhatnagar, Bharat Lal ; Xu, Yuanlu ; Winkler, Alexander ; Kadlecek, Petr ; Tang, Siyu ; Bogo, Federica</creator><creatorcontrib>Zhang, Siwei ; Bhatnagar, Bharat Lal ; Xu, Yuanlu ; Winkler, Alexander ; Kadlecek, Petr ; Tang, Siyu ; Bogo, Federica</creatorcontrib><description>We propose RoHM, an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos in the presence of noise and occlusions. Most previous approaches either train neural networks to directly regress motion in 3D or learn data-driven motion priors and combine them with optimization at test time. The former do not recover globally coherent motion and fail under occlusions; the latter are time-consuming, prone to local minima, and require manual tuning. To overcome these shortcomings, we exploit the iterative, denoising nature of diffusion models. RoHM is a novel diffusion-based motion model that, conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates. Given the complexity of the problem -- requiring one to address different tasks (denoising and infilling) in different solution spaces (local and global motion) -- we decompose it into two sub-tasks and learn two models, one for global trajectory and one for local motion. To capture the correlations between the two, we then introduce a novel conditioning module, combining it with an iterative inference scheme. We apply RoHM to a variety of tasks -- from motion reconstruction and denoising to spatial and temporal infilling. Extensive experiments on three popular datasets show that our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time. The code is available at https://sanweiliti.github.io/ROHM/ROHM.html.</description><identifier>DOI: 10.48550/arxiv.2401.08570</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-01</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2401.08570$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2401.08570$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Siwei</creatorcontrib><creatorcontrib>Bhatnagar, Bharat Lal</creatorcontrib><creatorcontrib>Xu, Yuanlu</creatorcontrib><creatorcontrib>Winkler, Alexander</creatorcontrib><creatorcontrib>Kadlecek, Petr</creatorcontrib><creatorcontrib>Tang, Siyu</creatorcontrib><creatorcontrib>Bogo, Federica</creatorcontrib><title>RoHM: Robust Human Motion Reconstruction via Diffusion</title><description>We propose RoHM, an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos in the presence of noise and occlusions. Most previous approaches either train neural networks to directly regress motion in 3D or learn data-driven motion priors and combine them with optimization at test time. The former do not recover globally coherent motion and fail under occlusions; the latter are time-consuming, prone to local minima, and require manual tuning. To overcome these shortcomings, we exploit the iterative, denoising nature of diffusion models. RoHM is a novel diffusion-based motion model that, conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates. Given the complexity of the problem -- requiring one to address different tasks (denoising and infilling) in different solution spaces (local and global motion) -- we decompose it into two sub-tasks and learn two models, one for global trajectory and one for local motion. To capture the correlations between the two, we then introduce a novel conditioning module, combining it with an iterative inference scheme. We apply RoHM to a variety of tasks -- from motion reconstruction and denoising to spatial and temporal infilling. Extensive experiments on three popular datasets show that our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time. The code is available at https://sanweiliti.github.io/ROHM/ROHM.html.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotjl1rwjAYRnPjxdD9gF2ZP9DuTdI0iXfiph0oQvW-vEkbCGgz-iHz32_WXT0ceDgcQt4YpJmWEt6x-wm3lGfAUtBSwQvJy1gcVrSMduwHWoxXbOkhDiG2tGxcbPuhG92Et4D0I3g_9n-0IDOPl755_d85OW0_z5si2R93X5v1PsFcQWJyhxnjstHWcWOd8Qact8p7xi1THPMaBWYCpIamllxLeHyZUU7Ugok5WT6tU3f13YUrdvfq0V9N_eIXh-s_fQ</recordid><startdate>20240116</startdate><enddate>20240116</enddate><creator>Zhang, Siwei</creator><creator>Bhatnagar, Bharat Lal</creator><creator>Xu, Yuanlu</creator><creator>Winkler, Alexander</creator><creator>Kadlecek, Petr</creator><creator>Tang, Siyu</creator><creator>Bogo, Federica</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240116</creationdate><title>RoHM: Robust Human Motion Reconstruction via Diffusion</title><author>Zhang, Siwei ; Bhatnagar, Bharat Lal ; Xu, Yuanlu ; Winkler, Alexander ; Kadlecek, Petr ; Tang, Siyu ; Bogo, Federica</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-96ca4125e8bc29bc9f90cfb7ff12b172a6da3a430580ed528505e8b197c3d313</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Siwei</creatorcontrib><creatorcontrib>Bhatnagar, Bharat Lal</creatorcontrib><creatorcontrib>Xu, Yuanlu</creatorcontrib><creatorcontrib>Winkler, Alexander</creatorcontrib><creatorcontrib>Kadlecek, Petr</creatorcontrib><creatorcontrib>Tang, Siyu</creatorcontrib><creatorcontrib>Bogo, Federica</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Siwei</au><au>Bhatnagar, Bharat Lal</au><au>Xu, Yuanlu</au><au>Winkler, Alexander</au><au>Kadlecek, Petr</au><au>Tang, Siyu</au><au>Bogo, Federica</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>RoHM: Robust Human Motion Reconstruction via Diffusion</atitle><date>2024-01-16</date><risdate>2024</risdate><abstract>We propose RoHM, an approach for robust 3D human motion reconstruction from monocular RGB(-D) videos in the presence of noise and occlusions. Most previous approaches either train neural networks to directly regress motion in 3D or learn data-driven motion priors and combine them with optimization at test time. The former do not recover globally coherent motion and fail under occlusions; the latter are time-consuming, prone to local minima, and require manual tuning. To overcome these shortcomings, we exploit the iterative, denoising nature of diffusion models. RoHM is a novel diffusion-based motion model that, conditioned on noisy and occluded input data, reconstructs complete, plausible motions in consistent global coordinates. Given the complexity of the problem -- requiring one to address different tasks (denoising and infilling) in different solution spaces (local and global motion) -- we decompose it into two sub-tasks and learn two models, one for global trajectory and one for local motion. To capture the correlations between the two, we then introduce a novel conditioning module, combining it with an iterative inference scheme. We apply RoHM to a variety of tasks -- from motion reconstruction and denoising to spatial and temporal infilling. Extensive experiments on three popular datasets show that our method outperforms state-of-the-art approaches qualitatively and quantitatively, while being faster at test time. The code is available at https://sanweiliti.github.io/ROHM/ROHM.html.</abstract><doi>10.48550/arxiv.2401.08570</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2401.08570
ispartof
issn
language eng
recordid cdi_arxiv_primary_2401_08570
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title RoHM: Robust Human Motion Reconstruction via Diffusion
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-15T20%3A24%3A09IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=RoHM:%20Robust%20Human%20Motion%20Reconstruction%20via%20Diffusion&rft.au=Zhang,%20Siwei&rft.date=2024-01-16&rft_id=info:doi/10.48550/arxiv.2401.08570&rft_dat=%3Carxiv_GOX%3E2401_08570%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true