iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views

We present iFusion, a novel 3D object reconstruction framework that requires only two views with unknown camera poses. While single-view reconstruction yields visually appealing results, it can deviate significantly from the actual object, especially on unseen sides. Additional views improve reconst...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Wu, Chin-Hsuan, Chen, Yen-Chun, Solarte, Bolivar, Yuan, Lu, Sun, Min
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Wu, Chin-Hsuan Chen, Yen-Chun Solarte, Bolivar Yuan, Lu Sun, Min
description	We present iFusion, a novel 3D object reconstruction framework that requires only two views with unknown camera poses. While single-view reconstruction yields visually appealing results, it can deviate significantly from the actual object, especially on unseen sides. Additional views improve reconstruction fidelity but necessitate known camera poses. However, assuming the availability of pose may be unrealistic, and existing pose estimators fail in sparse view scenarios. To address this, we harness a pre-trained novel view synthesis diffusion model, which embeds implicit knowledge about the geometry and appearance of diverse objects. Our strategy unfolds in three steps: (1) We invert the diffusion model for camera pose estimation instead of synthesizing novel views. (2) The diffusion model is fine-tuned using provided views and estimated poses, turned into a novel view synthesizer tailored for the target object. (3) Leveraging registered views and the fine-tuned diffusion model, we reconstruct the 3D object. Experiments demonstrate strong performance in both pose estimation and novel view synthesis. Moreover, iFusion seamlessly integrates with various reconstruction methods and enhances them.
doi_str_mv	10.48550/arxiv.2312.17250
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2312_17250</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2312_17250</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-bfa69be2fbd703ccf68bd6c2d789c60b99bb35773971b201053ead9cfbcf08173</originalsourceid><addsrcrecordid>eNotz81KAzEUBeBsXEj1AVyZF5gxPyaZuCvV0dJCRYvbITdzI4F2UpJp1bcXx64OnAMHPkJuOKvvG6XYncvf8VQLyUXNjVDskqxieywxDQ90OZwwj3H4pI8xhKmkIWX6mgpWbUakb-jTUMZ89OM05rSn7weXC9KPiF_lilwEtyt4fc4Z2bZP28VLtd48LxfzdeW0YRUEpy2gCNAbJr0PuoFee9GbxnrNwFoAqYyR1nAQjDMl0fXWB_CBNdzIGbn9v5003SHHvcs_3Z-qm1TyF_NpSJo</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views</title><source>arXiv.org</source><creator>Wu, Chin-Hsuan ; Chen, Yen-Chun ; Solarte, Bolivar ; Yuan, Lu ; Sun, Min</creator><creatorcontrib>Wu, Chin-Hsuan ; Chen, Yen-Chun ; Solarte, Bolivar ; Yuan, Lu ; Sun, Min</creatorcontrib><description>We present iFusion, a novel 3D object reconstruction framework that requires only two views with unknown camera poses. While single-view reconstruction yields visually appealing results, it can deviate significantly from the actual object, especially on unseen sides. Additional views improve reconstruction fidelity but necessitate known camera poses. However, assuming the availability of pose may be unrealistic, and existing pose estimators fail in sparse view scenarios. To address this, we harness a pre-trained novel view synthesis diffusion model, which embeds implicit knowledge about the geometry and appearance of diverse objects. Our strategy unfolds in three steps: (1) We invert the diffusion model for camera pose estimation instead of synthesizing novel views. (2) The diffusion model is fine-tuned using provided views and estimated poses, turned into a novel view synthesizer tailored for the target object. (3) Leveraging registered views and the fine-tuned diffusion model, we reconstruct the 3D object. Experiments demonstrate strong performance in both pose estimation and novel view synthesis. Moreover, iFusion seamlessly integrates with various reconstruction methods and enhances them.</description><identifier>DOI: 10.48550/arxiv.2312.17250</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2023-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2312.17250$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2312.17250$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Wu, Chin-Hsuan</creatorcontrib><creatorcontrib>Chen, Yen-Chun</creatorcontrib><creatorcontrib>Solarte, Bolivar</creatorcontrib><creatorcontrib>Yuan, Lu</creatorcontrib><creatorcontrib>Sun, Min</creatorcontrib><title>iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views</title><description>We present iFusion, a novel 3D object reconstruction framework that requires only two views with unknown camera poses. While single-view reconstruction yields visually appealing results, it can deviate significantly from the actual object, especially on unseen sides. Additional views improve reconstruction fidelity but necessitate known camera poses. However, assuming the availability of pose may be unrealistic, and existing pose estimators fail in sparse view scenarios. To address this, we harness a pre-trained novel view synthesis diffusion model, which embeds implicit knowledge about the geometry and appearance of diverse objects. Our strategy unfolds in three steps: (1) We invert the diffusion model for camera pose estimation instead of synthesizing novel views. (2) The diffusion model is fine-tuned using provided views and estimated poses, turned into a novel view synthesizer tailored for the target object. (3) Leveraging registered views and the fine-tuned diffusion model, we reconstruct the 3D object. Experiments demonstrate strong performance in both pose estimation and novel view synthesis. Moreover, iFusion seamlessly integrates with various reconstruction methods and enhances them.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz81KAzEUBeBsXEj1AVyZF5gxPyaZuCvV0dJCRYvbITdzI4F2UpJp1bcXx64OnAMHPkJuOKvvG6XYncvf8VQLyUXNjVDskqxieywxDQ90OZwwj3H4pI8xhKmkIWX6mgpWbUakb-jTUMZ89OM05rSn7weXC9KPiF_lilwEtyt4fc4Z2bZP28VLtd48LxfzdeW0YRUEpy2gCNAbJr0PuoFee9GbxnrNwFoAqYyR1nAQjDMl0fXWB_CBNdzIGbn9v5003SHHvcs_3Z-qm1TyF_NpSJo</recordid><startdate>20231228</startdate><enddate>20231228</enddate><creator>Wu, Chin-Hsuan</creator><creator>Chen, Yen-Chun</creator><creator>Solarte, Bolivar</creator><creator>Yuan, Lu</creator><creator>Sun, Min</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231228</creationdate><title>iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views</title><author>Wu, Chin-Hsuan ; Chen, Yen-Chun ; Solarte, Bolivar ; Yuan, Lu ; Sun, Min</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-bfa69be2fbd703ccf68bd6c2d789c60b99bb35773971b201053ead9cfbcf08173</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Wu, Chin-Hsuan</creatorcontrib><creatorcontrib>Chen, Yen-Chun</creatorcontrib><creatorcontrib>Solarte, Bolivar</creatorcontrib><creatorcontrib>Yuan, Lu</creatorcontrib><creatorcontrib>Sun, Min</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Wu, Chin-Hsuan</au><au>Chen, Yen-Chun</au><au>Solarte, Bolivar</au><au>Yuan, Lu</au><au>Sun, Min</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views</atitle><date>2023-12-28</date><risdate>2023</risdate><abstract>We present iFusion, a novel 3D object reconstruction framework that requires only two views with unknown camera poses. While single-view reconstruction yields visually appealing results, it can deviate significantly from the actual object, especially on unseen sides. Additional views improve reconstruction fidelity but necessitate known camera poses. However, assuming the availability of pose may be unrealistic, and existing pose estimators fail in sparse view scenarios. To address this, we harness a pre-trained novel view synthesis diffusion model, which embeds implicit knowledge about the geometry and appearance of diverse objects. Our strategy unfolds in three steps: (1) We invert the diffusion model for camera pose estimation instead of synthesizing novel views. (2) The diffusion model is fine-tuned using provided views and estimated poses, turned into a novel view synthesizer tailored for the target object. (3) Leveraging registered views and the fine-tuned diffusion model, we reconstruct the 3D object. Experiments demonstrate strong performance in both pose estimation and novel view synthesis. Moreover, iFusion seamlessly integrates with various reconstruction methods and enhances them.</abstract><doi>10.48550/arxiv.2312.17250</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2312.17250
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2312_17250
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	iFusion: Inverting Diffusion for Pose-Free Reconstruction from Sparse Views
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T22%3A10%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=iFusion:%20Inverting%20Diffusion%20for%20Pose-Free%20Reconstruction%20from%20Sparse%20Views&rft.au=Wu,%20Chin-Hsuan&rft.date=2023-12-28&rft_id=info:doi/10.48550/arxiv.2312.17250&rft_dat=%3Carxiv_GOX%3E2312_17250%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true