Neural monocular 3D human motion capture with physical awareness

We present a new trainable system for physically plausible markerless 3D human motion capture, which achieves state-of-the-art results in a broad range of challenging scenarios. Unlike most neural methods for human motion capture, our approach, which we dub "physionical", is aware of physi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on graphics 2021-08, Vol.40 (4), p.1-15, Article 83
Hauptverfasser: Shimada, Soshi, Golyanik, Vladislav, Xu, Weipeng, Pérez, Patrick, Theobalt, Christian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 15
container_issue 4
container_start_page 1
container_title ACM transactions on graphics
container_volume 40
creator Shimada, Soshi
Golyanik, Vladislav
Xu, Weipeng
Pérez, Patrick
Theobalt, Christian
description We present a new trainable system for physically plausible markerless 3D human motion capture, which achieves state-of-the-art results in a broad range of challenging scenarios. Unlike most neural methods for human motion capture, our approach, which we dub "physionical", is aware of physical and environmental constraints. It combines in a fully-differentiable way several key innovations, i.e., 1) a proportional-derivative controller, with gains predicted by a neural network, that reduces delays even in the presence of fast motions, 2) an explicit rigid body dynamics model and 3) a novel optimisation layer that prevents physically implausible foot-floor penetration as a hard constraint. The inputs to our system are 2D joint keypoints, which are canonicalised in a novel way so as to reduce the dependency on intrinsic camera parameters---both at train and test time. This enables more accurate global translation estimation without generalisability loss. Our model can be finetuned only with 2D annotations when the 3D annotations are not available. It produces smooth and physically-principled 3D motions in an interactive frame rate in a wide variety of challenging scenes, including newly recorded ones. Its advantages are especially noticeable on in-the-wild sequences that significantly differ from common 3D pose estimation benchmarks such as Human 3.6M and MPI-INF-3DHP. Qualitative results are provided in the supplementary video.
doi_str_mv 10.1145/3450626.3459825
format Article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3450626_3459825</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3459825</sourcerecordid><originalsourceid>FETCH-LOGICAL-a301t-710e0ea949e0a0984fd23b843890724fc50f209d44ce042ceffcad3dcf4f5d763</originalsourceid><addsrcrecordid>eNo9kEtLxEAQhAdRMK6eBU_zB7Lb805uyuqqsOhFz6GdzJBIXswkLPvvjWz0VNBV1QUfIbcM1oxJtRFSgeZ6PWuecXVGEqaUSY3Q2TlJwAhIQQC7JFcxfgOAllIn5P7NTQEb2vZdb6cGAxWPtJpa7ObTWPcdtTiMU3D0UI8VHapjrO2cxwMG17kYr8mFxya6m0VX5HP39LF9Sffvz6_bh32K8-iYGgYOHOYyd4CQZ9KXXHxlUmQ5GC69VeA55KWU1oHk1nlvsRSl9dKr0mixIpvTXxv6GIPzxRDqFsOxYFD8AigWAMUCYG7cnRpo2__wn_kDisBWSw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Neural monocular 3D human motion capture with physical awareness</title><source>ACM Digital Library Complete</source><creator>Shimada, Soshi ; Golyanik, Vladislav ; Xu, Weipeng ; Pérez, Patrick ; Theobalt, Christian</creator><creatorcontrib>Shimada, Soshi ; Golyanik, Vladislav ; Xu, Weipeng ; Pérez, Patrick ; Theobalt, Christian</creatorcontrib><description>We present a new trainable system for physically plausible markerless 3D human motion capture, which achieves state-of-the-art results in a broad range of challenging scenarios. Unlike most neural methods for human motion capture, our approach, which we dub "physionical", is aware of physical and environmental constraints. It combines in a fully-differentiable way several key innovations, i.e., 1) a proportional-derivative controller, with gains predicted by a neural network, that reduces delays even in the presence of fast motions, 2) an explicit rigid body dynamics model and 3) a novel optimisation layer that prevents physically implausible foot-floor penetration as a hard constraint. The inputs to our system are 2D joint keypoints, which are canonicalised in a novel way so as to reduce the dependency on intrinsic camera parameters---both at train and test time. This enables more accurate global translation estimation without generalisability loss. Our model can be finetuned only with 2D annotations when the 3D annotations are not available. It produces smooth and physically-principled 3D motions in an interactive frame rate in a wide variety of challenging scenes, including newly recorded ones. Its advantages are especially noticeable on in-the-wild sequences that significantly differ from common 3D pose estimation benchmarks such as Human 3.6M and MPI-INF-3DHP. Qualitative results are provided in the supplementary video.</description><identifier>ISSN: 0730-0301</identifier><identifier>EISSN: 1557-7368</identifier><identifier>DOI: 10.1145/3450626.3459825</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Animation ; Computer graphics ; Computing methodologies ; Motion capture</subject><ispartof>ACM transactions on graphics, 2021-08, Vol.40 (4), p.1-15, Article 83</ispartof><rights>Owner/Author</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a301t-710e0ea949e0a0984fd23b843890724fc50f209d44ce042ceffcad3dcf4f5d763</citedby><cites>FETCH-LOGICAL-a301t-710e0ea949e0a0984fd23b843890724fc50f209d44ce042ceffcad3dcf4f5d763</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3450626.3459825$$EPDF$$P50$$Gacm$$Hfree_for_read</linktopdf><link.rule.ids>314,776,780,2276,27903,27904,40175,75974</link.rule.ids></links><search><creatorcontrib>Shimada, Soshi</creatorcontrib><creatorcontrib>Golyanik, Vladislav</creatorcontrib><creatorcontrib>Xu, Weipeng</creatorcontrib><creatorcontrib>Pérez, Patrick</creatorcontrib><creatorcontrib>Theobalt, Christian</creatorcontrib><title>Neural monocular 3D human motion capture with physical awareness</title><title>ACM transactions on graphics</title><addtitle>ACM TOG</addtitle><description>We present a new trainable system for physically plausible markerless 3D human motion capture, which achieves state-of-the-art results in a broad range of challenging scenarios. Unlike most neural methods for human motion capture, our approach, which we dub "physionical", is aware of physical and environmental constraints. It combines in a fully-differentiable way several key innovations, i.e., 1) a proportional-derivative controller, with gains predicted by a neural network, that reduces delays even in the presence of fast motions, 2) an explicit rigid body dynamics model and 3) a novel optimisation layer that prevents physically implausible foot-floor penetration as a hard constraint. The inputs to our system are 2D joint keypoints, which are canonicalised in a novel way so as to reduce the dependency on intrinsic camera parameters---both at train and test time. This enables more accurate global translation estimation without generalisability loss. Our model can be finetuned only with 2D annotations when the 3D annotations are not available. It produces smooth and physically-principled 3D motions in an interactive frame rate in a wide variety of challenging scenes, including newly recorded ones. Its advantages are especially noticeable on in-the-wild sequences that significantly differ from common 3D pose estimation benchmarks such as Human 3.6M and MPI-INF-3DHP. Qualitative results are provided in the supplementary video.</description><subject>Animation</subject><subject>Computer graphics</subject><subject>Computing methodologies</subject><subject>Motion capture</subject><issn>0730-0301</issn><issn>1557-7368</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNo9kEtLxEAQhAdRMK6eBU_zB7Lb805uyuqqsOhFz6GdzJBIXswkLPvvjWz0VNBV1QUfIbcM1oxJtRFSgeZ6PWuecXVGEqaUSY3Q2TlJwAhIQQC7JFcxfgOAllIn5P7NTQEb2vZdb6cGAxWPtJpa7ObTWPcdtTiMU3D0UI8VHapjrO2cxwMG17kYr8mFxya6m0VX5HP39LF9Sffvz6_bh32K8-iYGgYOHOYyd4CQZ9KXXHxlUmQ5GC69VeA55KWU1oHk1nlvsRSl9dKr0mixIpvTXxv6GIPzxRDqFsOxYFD8AigWAMUCYG7cnRpo2__wn_kDisBWSw</recordid><startdate>20210831</startdate><enddate>20210831</enddate><creator>Shimada, Soshi</creator><creator>Golyanik, Vladislav</creator><creator>Xu, Weipeng</creator><creator>Pérez, Patrick</creator><creator>Theobalt, Christian</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>20210831</creationdate><title>Neural monocular 3D human motion capture with physical awareness</title><author>Shimada, Soshi ; Golyanik, Vladislav ; Xu, Weipeng ; Pérez, Patrick ; Theobalt, Christian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a301t-710e0ea949e0a0984fd23b843890724fc50f209d44ce042ceffcad3dcf4f5d763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Animation</topic><topic>Computer graphics</topic><topic>Computing methodologies</topic><topic>Motion capture</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Shimada, Soshi</creatorcontrib><creatorcontrib>Golyanik, Vladislav</creatorcontrib><creatorcontrib>Xu, Weipeng</creatorcontrib><creatorcontrib>Pérez, Patrick</creatorcontrib><creatorcontrib>Theobalt, Christian</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on graphics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Shimada, Soshi</au><au>Golyanik, Vladislav</au><au>Xu, Weipeng</au><au>Pérez, Patrick</au><au>Theobalt, Christian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Neural monocular 3D human motion capture with physical awareness</atitle><jtitle>ACM transactions on graphics</jtitle><stitle>ACM TOG</stitle><date>2021-08-31</date><risdate>2021</risdate><volume>40</volume><issue>4</issue><spage>1</spage><epage>15</epage><pages>1-15</pages><artnum>83</artnum><issn>0730-0301</issn><eissn>1557-7368</eissn><abstract>We present a new trainable system for physically plausible markerless 3D human motion capture, which achieves state-of-the-art results in a broad range of challenging scenarios. Unlike most neural methods for human motion capture, our approach, which we dub "physionical", is aware of physical and environmental constraints. It combines in a fully-differentiable way several key innovations, i.e., 1) a proportional-derivative controller, with gains predicted by a neural network, that reduces delays even in the presence of fast motions, 2) an explicit rigid body dynamics model and 3) a novel optimisation layer that prevents physically implausible foot-floor penetration as a hard constraint. The inputs to our system are 2D joint keypoints, which are canonicalised in a novel way so as to reduce the dependency on intrinsic camera parameters---both at train and test time. This enables more accurate global translation estimation without generalisability loss. Our model can be finetuned only with 2D annotations when the 3D annotations are not available. It produces smooth and physically-principled 3D motions in an interactive frame rate in a wide variety of challenging scenes, including newly recorded ones. Its advantages are especially noticeable on in-the-wild sequences that significantly differ from common 3D pose estimation benchmarks such as Human 3.6M and MPI-INF-3DHP. Qualitative results are provided in the supplementary video.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3450626.3459825</doi><tpages>15</tpages><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 0730-0301
ispartof ACM transactions on graphics, 2021-08, Vol.40 (4), p.1-15, Article 83
issn 0730-0301
1557-7368
language eng
recordid cdi_crossref_primary_10_1145_3450626_3459825
source ACM Digital Library Complete
subjects Animation
Computer graphics
Computing methodologies
Motion capture
title Neural monocular 3D human motion capture with physical awareness
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T20%3A23%3A16IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Neural%20monocular%203D%20human%20motion%20capture%20with%20physical%20awareness&rft.jtitle=ACM%20transactions%20on%20graphics&rft.au=Shimada,%20Soshi&rft.date=2021-08-31&rft.volume=40&rft.issue=4&rft.spage=1&rft.epage=15&rft.pages=1-15&rft.artnum=83&rft.issn=0730-0301&rft.eissn=1557-7368&rft_id=info:doi/10.1145/3450626.3459825&rft_dat=%3Cacm_cross%3E3459825%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true