Deep Non-Rigid Structure From Motion With Missing Data

Non-rigid structure from motion (NRSfM) refers to the problem of reconstructing cameras and the 3D point cloud of a non-rigid object from an ensemble of images with 2D correspondences. Current NRSfM algorithms are limited from two perspectives: (i) the number of images, and (ii) the type of shape va...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2021-12, Vol.43 (12), p.4365-4377
Hauptverfasser: Kong, Chen, Lucey, Simon
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4377
container_issue 12
container_start_page 4365
container_title IEEE transactions on pattern analysis and machine intelligence
container_volume 43
creator Kong, Chen
Lucey, Simon
description Non-rigid structure from motion (NRSfM) refers to the problem of reconstructing cameras and the 3D point cloud of a non-rigid object from an ensemble of images with 2D correspondences. Current NRSfM algorithms are limited from two perspectives: (i) the number of images, and (ii) the type of shape variability they can handle. These difficulties stem from the inherent conflict between the condition of the system and the degrees of freedom needing to be modeled - which has hampered its practical utility for many applications within vision. In this paper we propose a novel hierarchical sparse coding model for NRSFM which can overcome (i) and (ii) to such an extent, that NRSFM can be applied to problems in vision previously thought too ill posed. Our approach is realized in practice as the training of an unsupervised deep neural network (DNN) auto-encoder with a unique architecture that is able to disentangle pose from 3D structure. Using modern deep learning computational platforms allows us to solve NRSfM problems at an unprecedented scale and shape complexity. Our approach has no 3D supervision, relying solely on 2D point correspondences. Further, our approach is also able to handle missing/occluded 2D points without the need for matrix completion. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works in some instances by an order of magnitude. We further propose a new quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstructability. We believe our work to be a significant advance over state-of-the-art in NRSFM.
doi_str_mv 10.1109/TPAMI.2020.2997026
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TPAMI_2020_2997026</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9099404</ieee_id><sourcerecordid>2592630429</sourcerecordid><originalsourceid>FETCH-LOGICAL-c372t-b9114631e4648fe05600ac0cefeae9a29c0e5bb85f272f333691c1d1086863cf3</originalsourceid><addsrcrecordid>eNpdkEFPwkAQRjdGI4j-Ab008eKlODvb3XaPBERJRI1iPG7KMsUl0OJue_DfW4R48DSX975MHmOXHPqcg76dvQymkz4CQh-1TgHVEetyLXQspNDHrAtcYZxlmHXYWQgrAJ5IEKesIzCVkKbYZWpEtI2eqjJ-dUu3iN5q39i68RSNfbWJplXtqjL6cPVnNHUhuHIZjfI6P2cnRb4OdHG4PfY-vpsNH-LH5_vJcPAYW5FiHc8154kSnBKVZAWBVAC5BUsF5aRz1BZIzueZLDDFQgihNLd8wSFTmRK2ED12s9_d-uqroVCbjQuW1uu8pKoJBhMB7bSQWYte_0NXVePL9juDUqMSkKBuKdxT1lcheCrM1rtN7r8NB7Oran6rml1Vc6jaSld7yRHRn6BB6wQS8QMMwW68</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2592630429</pqid></control><display><type>article</type><title>Deep Non-Rigid Structure From Motion With Missing Data</title><source>IEEE Electronic Library (IEL)</source><creator>Kong, Chen ; Lucey, Simon</creator><creatorcontrib>Kong, Chen ; Lucey, Simon</creatorcontrib><description>Non-rigid structure from motion (NRSfM) refers to the problem of reconstructing cameras and the 3D point cloud of a non-rigid object from an ensemble of images with 2D correspondences. Current NRSfM algorithms are limited from two perspectives: (i) the number of images, and (ii) the type of shape variability they can handle. These difficulties stem from the inherent conflict between the condition of the system and the degrees of freedom needing to be modeled - which has hampered its practical utility for many applications within vision. In this paper we propose a novel hierarchical sparse coding model for NRSFM which can overcome (i) and (ii) to such an extent, that NRSFM can be applied to problems in vision previously thought too ill posed. Our approach is realized in practice as the training of an unsupervised deep neural network (DNN) auto-encoder with a unique architecture that is able to disentangle pose from 3D structure. Using modern deep learning computational platforms allows us to solve NRSfM problems at an unprecedented scale and shape complexity. Our approach has no 3D supervision, relying solely on 2D point correspondences. Further, our approach is also able to handle missing/occluded 2D points without the need for matrix completion. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works in some instances by an order of magnitude. We further propose a new quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstructability. We believe our work to be a significant advance over state-of-the-art in NRSFM.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 1939-3539</identifier><identifier>EISSN: 2160-9292</identifier><identifier>DOI: 10.1109/TPAMI.2020.2997026</identifier><identifier>PMID: 32750772</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Artificial neural networks ; Coders ; deep neural network ; Encoding ; hierarchical sparse coding ; Image reconstruction ; Machine learning ; Missing data ; Neural networks ; Nonrigid structure from motion ; reconstructability ; Rigid structures ; Shape ; Structure from motion ; Three dimensional models ; Three-dimensional displays ; Two dimensional displays ; Vision</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2021-12, Vol.43 (12), p.4365-4377</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2021</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c372t-b9114631e4648fe05600ac0cefeae9a29c0e5bb85f272f333691c1d1086863cf3</citedby><cites>FETCH-LOGICAL-c372t-b9114631e4648fe05600ac0cefeae9a29c0e5bb85f272f333691c1d1086863cf3</cites><orcidid>0000-0002-5095-7930</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9099404$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9099404$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Kong, Chen</creatorcontrib><creatorcontrib>Lucey, Simon</creatorcontrib><title>Deep Non-Rigid Structure From Motion With Missing Data</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><description>Non-rigid structure from motion (NRSfM) refers to the problem of reconstructing cameras and the 3D point cloud of a non-rigid object from an ensemble of images with 2D correspondences. Current NRSfM algorithms are limited from two perspectives: (i) the number of images, and (ii) the type of shape variability they can handle. These difficulties stem from the inherent conflict between the condition of the system and the degrees of freedom needing to be modeled - which has hampered its practical utility for many applications within vision. In this paper we propose a novel hierarchical sparse coding model for NRSFM which can overcome (i) and (ii) to such an extent, that NRSFM can be applied to problems in vision previously thought too ill posed. Our approach is realized in practice as the training of an unsupervised deep neural network (DNN) auto-encoder with a unique architecture that is able to disentangle pose from 3D structure. Using modern deep learning computational platforms allows us to solve NRSfM problems at an unprecedented scale and shape complexity. Our approach has no 3D supervision, relying solely on 2D point correspondences. Further, our approach is also able to handle missing/occluded 2D points without the need for matrix completion. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works in some instances by an order of magnitude. We further propose a new quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstructability. We believe our work to be a significant advance over state-of-the-art in NRSFM.</description><subject>Algorithms</subject><subject>Artificial neural networks</subject><subject>Coders</subject><subject>deep neural network</subject><subject>Encoding</subject><subject>hierarchical sparse coding</subject><subject>Image reconstruction</subject><subject>Machine learning</subject><subject>Missing data</subject><subject>Neural networks</subject><subject>Nonrigid structure from motion</subject><subject>reconstructability</subject><subject>Rigid structures</subject><subject>Shape</subject><subject>Structure from motion</subject><subject>Three dimensional models</subject><subject>Three-dimensional displays</subject><subject>Two dimensional displays</subject><subject>Vision</subject><issn>0162-8828</issn><issn>1939-3539</issn><issn>2160-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkEFPwkAQRjdGI4j-Ab008eKlODvb3XaPBERJRI1iPG7KMsUl0OJue_DfW4R48DSX975MHmOXHPqcg76dvQymkz4CQh-1TgHVEetyLXQspNDHrAtcYZxlmHXYWQgrAJ5IEKesIzCVkKbYZWpEtI2eqjJ-dUu3iN5q39i68RSNfbWJplXtqjL6cPVnNHUhuHIZjfI6P2cnRb4OdHG4PfY-vpsNH-LH5_vJcPAYW5FiHc8154kSnBKVZAWBVAC5BUsF5aRz1BZIzueZLDDFQgihNLd8wSFTmRK2ED12s9_d-uqroVCbjQuW1uu8pKoJBhMB7bSQWYte_0NXVePL9juDUqMSkKBuKdxT1lcheCrM1rtN7r8NB7Oran6rml1Vc6jaSld7yRHRn6BB6wQS8QMMwW68</recordid><startdate>20211201</startdate><enddate>20211201</enddate><creator>Kong, Chen</creator><creator>Lucey, Simon</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-5095-7930</orcidid></search><sort><creationdate>20211201</creationdate><title>Deep Non-Rigid Structure From Motion With Missing Data</title><author>Kong, Chen ; Lucey, Simon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c372t-b9114631e4648fe05600ac0cefeae9a29c0e5bb85f272f333691c1d1086863cf3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Artificial neural networks</topic><topic>Coders</topic><topic>deep neural network</topic><topic>Encoding</topic><topic>hierarchical sparse coding</topic><topic>Image reconstruction</topic><topic>Machine learning</topic><topic>Missing data</topic><topic>Neural networks</topic><topic>Nonrigid structure from motion</topic><topic>reconstructability</topic><topic>Rigid structures</topic><topic>Shape</topic><topic>Structure from motion</topic><topic>Three dimensional models</topic><topic>Three-dimensional displays</topic><topic>Two dimensional displays</topic><topic>Vision</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kong, Chen</creatorcontrib><creatorcontrib>Lucey, Simon</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Kong, Chen</au><au>Lucey, Simon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Deep Non-Rigid Structure From Motion With Missing Data</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><date>2021-12-01</date><risdate>2021</risdate><volume>43</volume><issue>12</issue><spage>4365</spage><epage>4377</epage><pages>4365-4377</pages><issn>0162-8828</issn><eissn>1939-3539</eissn><eissn>2160-9292</eissn><coden>ITPIDJ</coden><abstract>Non-rigid structure from motion (NRSfM) refers to the problem of reconstructing cameras and the 3D point cloud of a non-rigid object from an ensemble of images with 2D correspondences. Current NRSfM algorithms are limited from two perspectives: (i) the number of images, and (ii) the type of shape variability they can handle. These difficulties stem from the inherent conflict between the condition of the system and the degrees of freedom needing to be modeled - which has hampered its practical utility for many applications within vision. In this paper we propose a novel hierarchical sparse coding model for NRSFM which can overcome (i) and (ii) to such an extent, that NRSFM can be applied to problems in vision previously thought too ill posed. Our approach is realized in practice as the training of an unsupervised deep neural network (DNN) auto-encoder with a unique architecture that is able to disentangle pose from 3D structure. Using modern deep learning computational platforms allows us to solve NRSfM problems at an unprecedented scale and shape complexity. Our approach has no 3D supervision, relying solely on 2D point correspondences. Further, our approach is also able to handle missing/occluded 2D points without the need for matrix completion. Extensive experiments demonstrate the impressive performance of our approach where we exhibit superior precision and robustness against all available state-of-the-art works in some instances by an order of magnitude. We further propose a new quality measure (based on the network weights) which circumvents the need for 3D ground-truth to ascertain the confidence we have in the reconstructability. We believe our work to be a significant advance over state-of-the-art in NRSFM.</abstract><cop>New York</cop><pub>IEEE</pub><pmid>32750772</pmid><doi>10.1109/TPAMI.2020.2997026</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-5095-7930</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0162-8828
ispartof IEEE transactions on pattern analysis and machine intelligence, 2021-12, Vol.43 (12), p.4365-4377
issn 0162-8828
1939-3539
2160-9292
language eng
recordid cdi_crossref_primary_10_1109_TPAMI_2020_2997026
source IEEE Electronic Library (IEL)
subjects Algorithms
Artificial neural networks
Coders
deep neural network
Encoding
hierarchical sparse coding
Image reconstruction
Machine learning
Missing data
Neural networks
Nonrigid structure from motion
reconstructability
Rigid structures
Shape
Structure from motion
Three dimensional models
Three-dimensional displays
Two dimensional displays
Vision
title Deep Non-Rigid Structure From Motion With Missing Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T13%3A26%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Deep%20Non-Rigid%20Structure%20From%20Motion%20With%20Missing%20Data&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Kong,%20Chen&rft.date=2021-12-01&rft.volume=43&rft.issue=12&rft.spage=4365&rft.epage=4377&rft.pages=4365-4377&rft.issn=0162-8828&rft.eissn=1939-3539&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2020.2997026&rft_dat=%3Cproquest_RIE%3E2592630429%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2592630429&rft_id=info:pmid/32750772&rft_ieee_id=9099404&rfr_iscdi=true