M3D: Dual-Stream Selective State Spaces and Depth-Driven Framework for High-Fidelity Single-View 3D Reconstruction

The precise reconstruction of 3D objects from a single RGB image in complex scenes presents a critical challenge in virtual reality, autonomous driving, and robotics. Existing neural implicit 3D representation methods face significant difficulties in balancing the extraction of global and local feat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Luoxi, Shrestha, Pragyan, Zhou, Yu, Xie, Chun, Kitahara, Itaru
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhang, Luoxi Shrestha, Pragyan Zhou, Yu Xie, Chun Kitahara, Itaru
description	The precise reconstruction of 3D objects from a single RGB image in complex scenes presents a critical challenge in virtual reality, autonomous driving, and robotics. Existing neural implicit 3D representation methods face significant difficulties in balancing the extraction of global and local features, particularly in diverse and complex environments, leading to insufficient reconstruction precision and quality. We propose M3D, a novel single-view 3D reconstruction framework, to tackle these challenges. This framework adopts a dual-stream feature extraction strategy based on Selective State Spaces to effectively balance the extraction of global and local features, thereby improving scene comprehension and representation precision. Additionally, a parallel branch extracts depth information, effectively integrating visual and geometric features to enhance reconstruction quality and preserve intricate details. Experimental results indicate that the fusion of multi-scale features with depth information via the dual-branch feature extraction significantly boosts geometric consistency and fidelity, achieving state-of-the-art reconstruction performance.
doi_str_mv	10.48550/arxiv.2411.12635
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2411_12635</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2411_12635</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2411_126353</originalsourceid><addsrcrecordid>eNqFjrEOgkAQRK-xMOoHWLk_cAgixth6EhsbMbZkgwtcPA6ynKJ_LxJ7m5liXiZPiHnge-ttFPlL5Jd-eqt1EHjBahNGY8GnUO1APdDIxDFhBQkZypx-EiQOXZ8NZtQC2hsoalwpFfejhZixoq7mO-Q1w1EXpYz1jYx2b0i0LQzJq6YOQgVnymrbOn70v7WdilGOpqXZrydiER8u-6Mc7NKGdYX8Tr-W6WAZ_ic-yydHrg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>M3D: Dual-Stream Selective State Spaces and Depth-Driven Framework for High-Fidelity Single-View 3D Reconstruction</title><source>arXiv.org</source><creator>Zhang, Luoxi ; Shrestha, Pragyan ; Zhou, Yu ; Xie, Chun ; Kitahara, Itaru</creator><creatorcontrib>Zhang, Luoxi ; Shrestha, Pragyan ; Zhou, Yu ; Xie, Chun ; Kitahara, Itaru</creatorcontrib><description>The precise reconstruction of 3D objects from a single RGB image in complex scenes presents a critical challenge in virtual reality, autonomous driving, and robotics. Existing neural implicit 3D representation methods face significant difficulties in balancing the extraction of global and local features, particularly in diverse and complex environments, leading to insufficient reconstruction precision and quality. We propose M3D, a novel single-view 3D reconstruction framework, to tackle these challenges. This framework adopts a dual-stream feature extraction strategy based on Selective State Spaces to effectively balance the extraction of global and local features, thereby improving scene comprehension and representation precision. Additionally, a parallel branch extracts depth information, effectively integrating visual and geometric features to enhance reconstruction quality and preserve intricate details. Experimental results indicate that the fusion of multi-scale features with depth information via the dual-branch feature extraction significantly boosts geometric consistency and fidelity, achieving state-of-the-art reconstruction performance.</description><identifier>DOI: 10.48550/arxiv.2411.12635</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-11</creationdate><rights>http://creativecommons.org/licenses/by-nc-nd/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2411.12635$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2411.12635$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Luoxi</creatorcontrib><creatorcontrib>Shrestha, Pragyan</creatorcontrib><creatorcontrib>Zhou, Yu</creatorcontrib><creatorcontrib>Xie, Chun</creatorcontrib><creatorcontrib>Kitahara, Itaru</creatorcontrib><title>M3D: Dual-Stream Selective State Spaces and Depth-Driven Framework for High-Fidelity Single-View 3D Reconstruction</title><description>The precise reconstruction of 3D objects from a single RGB image in complex scenes presents a critical challenge in virtual reality, autonomous driving, and robotics. Existing neural implicit 3D representation methods face significant difficulties in balancing the extraction of global and local features, particularly in diverse and complex environments, leading to insufficient reconstruction precision and quality. We propose M3D, a novel single-view 3D reconstruction framework, to tackle these challenges. This framework adopts a dual-stream feature extraction strategy based on Selective State Spaces to effectively balance the extraction of global and local features, thereby improving scene comprehension and representation precision. Additionally, a parallel branch extracts depth information, effectively integrating visual and geometric features to enhance reconstruction quality and preserve intricate details. Experimental results indicate that the fusion of multi-scale features with depth information via the dual-branch feature extraction significantly boosts geometric consistency and fidelity, achieving state-of-the-art reconstruction performance.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjrEOgkAQRK-xMOoHWLk_cAgixth6EhsbMbZkgwtcPA6ynKJ_LxJ7m5liXiZPiHnge-ttFPlL5Jd-eqt1EHjBahNGY8GnUO1APdDIxDFhBQkZypx-EiQOXZ8NZtQC2hsoalwpFfejhZixoq7mO-Q1w1EXpYz1jYx2b0i0LQzJq6YOQgVnymrbOn70v7WdilGOpqXZrydiER8u-6Mc7NKGdYX8Tr-W6WAZ_ic-yydHrg</recordid><startdate>20241119</startdate><enddate>20241119</enddate><creator>Zhang, Luoxi</creator><creator>Shrestha, Pragyan</creator><creator>Zhou, Yu</creator><creator>Xie, Chun</creator><creator>Kitahara, Itaru</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241119</creationdate><title>M3D: Dual-Stream Selective State Spaces and Depth-Driven Framework for High-Fidelity Single-View 3D Reconstruction</title><author>Zhang, Luoxi ; Shrestha, Pragyan ; Zhou, Yu ; Xie, Chun ; Kitahara, Itaru</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2411_126353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Luoxi</creatorcontrib><creatorcontrib>Shrestha, Pragyan</creatorcontrib><creatorcontrib>Zhou, Yu</creatorcontrib><creatorcontrib>Xie, Chun</creatorcontrib><creatorcontrib>Kitahara, Itaru</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Luoxi</au><au>Shrestha, Pragyan</au><au>Zhou, Yu</au><au>Xie, Chun</au><au>Kitahara, Itaru</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>M3D: Dual-Stream Selective State Spaces and Depth-Driven Framework for High-Fidelity Single-View 3D Reconstruction</atitle><date>2024-11-19</date><risdate>2024</risdate><abstract>The precise reconstruction of 3D objects from a single RGB image in complex scenes presents a critical challenge in virtual reality, autonomous driving, and robotics. Existing neural implicit 3D representation methods face significant difficulties in balancing the extraction of global and local features, particularly in diverse and complex environments, leading to insufficient reconstruction precision and quality. We propose M3D, a novel single-view 3D reconstruction framework, to tackle these challenges. This framework adopts a dual-stream feature extraction strategy based on Selective State Spaces to effectively balance the extraction of global and local features, thereby improving scene comprehension and representation precision. Additionally, a parallel branch extracts depth information, effectively integrating visual and geometric features to enhance reconstruction quality and preserve intricate details. Experimental results indicate that the fusion of multi-scale features with depth information via the dual-branch feature extraction significantly boosts geometric consistency and fidelity, achieving state-of-the-art reconstruction performance.</abstract><doi>10.48550/arxiv.2411.12635</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2411.12635
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2411_12635
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	M3D: Dual-Stream Selective State Spaces and Depth-Driven Framework for High-Fidelity Single-View 3D Reconstruction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T02%3A15%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=M3D:%20Dual-Stream%20Selective%20State%20Spaces%20and%20Depth-Driven%20Framework%20for%20High-Fidelity%20Single-View%203D%20Reconstruction&rft.au=Zhang,%20Luoxi&rft.date=2024-11-19&rft_id=info:doi/10.48550/arxiv.2411.12635&rft_dat=%3Carxiv_GOX%3E2411_12635%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true