SceneDreamer: Unbounded 3D Scene Generation From 2D Image Collections

In this work, we present SceneDreamer , an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2023-12, Vol.45 (12), p.15562-15576
Hauptverfasser:	Chen, Zhaoxi, Wang, Guangcong, Liu, Ziwei
Format:	Artikel
Sprache:	eng
Schlagworte:	3D generative model Annotations Cameras GAN Geometry neural rendering Parameterization Random noise Renderers Rendering (computer graphics) Representations Scene generation Semantics Solid modeling Three dimensional models Three-dimensional displays Training unbounded scene generation
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	15576
container_issue	12
container_start_page	15562
container_title	IEEE transactions on pattern analysis and machine intelligence
container_volume	45
creator	Chen, Zhaoxi Wang, Guangcong Liu, Ziwei
description	In this work, we present SceneDreamer , an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising: 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables: 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.
doi_str_mv	10.1109/TPAMI.2023.3321857
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10269790</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10269790</ieee_id><sourcerecordid>2872808173</sourcerecordid><originalsourceid>FETCH-LOGICAL-c329t-f36d9182dafa1a87eb14246989e4936b3d3f25552c63b7a19a02fb23e3ffac483</originalsourceid><addsrcrecordid>eNpdkLtOAzEQRS0EIiHwA4hiJRqaDfbMPmy6KC8iBYFEUlve3TFKtI9gZwv-ns2jQDQzxT13NDqM3Qs-FIKr59XH6G0xBA44RAQh4_SC9UEkPFSg4JL1uUgglBJkj914v-VcRDHHa9bDNJVSKOyz6WdONU0cmYrcS7Cus6atCyoCnATHKJh3w5n9pqmDmWuqACbBojJfFIybsqT8EPhbdmVN6enuvAdsPZuuxq_h8n2-GI-WYY6g9qHFpFBCQmGsEUamlIkIokRJRZHCJMMCLcRxDHmCWWqEMhxsBkhorckjiQP2dLq7c813S36vq43PqSxNTU3rNcgUJJcixQ59_Idum9bV3XcdJeMkjmIZdRScqNw13juyeuc2lXE_WnB9kKyPkvVBsj5L7koPp9KGiP4UIFGp4vgLDDx0Pw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2885654584</pqid></control><display><type>article</type><title>SceneDreamer: Unbounded 3D Scene Generation From 2D Image Collections</title><source>IEEE Electronic Library (IEL)</source><creator>Chen, Zhaoxi ; Wang, Guangcong ; Liu, Ziwei</creator><creatorcontrib>Chen, Zhaoxi ; Wang, Guangcong ; Liu, Ziwei</creatorcontrib><description>In this work, we present SceneDreamer , an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising: 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables: 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 2160-9292</identifier><identifier>EISSN: 1939-3539</identifier><identifier>DOI: 10.1109/TPAMI.2023.3321857</identifier><identifier>PMID: 37788193</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>3D generative model ; Annotations ; Cameras ; GAN ; Geometry ; neural rendering ; Parameterization ; Random noise ; Renderers ; Rendering (computer graphics) ; Representations ; Scene generation ; Semantics ; Solid modeling ; Three dimensional models ; Three-dimensional displays ; Training ; unbounded scene generation</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2023-12, Vol.45 (12), p.15562-15576</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c329t-f36d9182dafa1a87eb14246989e4936b3d3f25552c63b7a19a02fb23e3ffac483</citedby><cites>FETCH-LOGICAL-c329t-f36d9182dafa1a87eb14246989e4936b3d3f25552c63b7a19a02fb23e3ffac483</cites><orcidid>0000-0002-4220-5958 ; 0000-0002-6627-814X ; 0000-0003-3998-7044</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10269790$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10269790$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chen, Zhaoxi</creatorcontrib><creatorcontrib>Wang, Guangcong</creatorcontrib><creatorcontrib>Liu, Ziwei</creatorcontrib><title>SceneDreamer: Unbounded 3D Scene Generation From 2D Image Collections</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><description>In this work, we present SceneDreamer , an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising: 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables: 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.</description><subject>3D generative model</subject><subject>Annotations</subject><subject>Cameras</subject><subject>GAN</subject><subject>Geometry</subject><subject>neural rendering</subject><subject>Parameterization</subject><subject>Random noise</subject><subject>Renderers</subject><subject>Rendering (computer graphics)</subject><subject>Representations</subject><subject>Scene generation</subject><subject>Semantics</subject><subject>Solid modeling</subject><subject>Three dimensional models</subject><subject>Three-dimensional displays</subject><subject>Training</subject><subject>unbounded scene generation</subject><issn>0162-8828</issn><issn>2160-9292</issn><issn>1939-3539</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkLtOAzEQRS0EIiHwA4hiJRqaDfbMPmy6KC8iBYFEUlve3TFKtI9gZwv-ns2jQDQzxT13NDqM3Qs-FIKr59XH6G0xBA44RAQh4_SC9UEkPFSg4JL1uUgglBJkj914v-VcRDHHa9bDNJVSKOyz6WdONU0cmYrcS7Cus6atCyoCnATHKJh3w5n9pqmDmWuqACbBojJfFIybsqT8EPhbdmVN6enuvAdsPZuuxq_h8n2-GI-WYY6g9qHFpFBCQmGsEUamlIkIokRJRZHCJMMCLcRxDHmCWWqEMhxsBkhorckjiQP2dLq7c813S36vq43PqSxNTU3rNcgUJJcixQ59_Idum9bV3XcdJeMkjmIZdRScqNw13juyeuc2lXE_WnB9kKyPkvVBsj5L7koPp9KGiP4UIFGp4vgLDDx0Pw</recordid><startdate>20231201</startdate><enddate>20231201</enddate><creator>Chen, Zhaoxi</creator><creator>Wang, Guangcong</creator><creator>Liu, Ziwei</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-4220-5958</orcidid><orcidid>https://orcid.org/0000-0002-6627-814X</orcidid><orcidid>https://orcid.org/0000-0003-3998-7044</orcidid></search><sort><creationdate>20231201</creationdate><title>SceneDreamer: Unbounded 3D Scene Generation From 2D Image Collections</title><author>Chen, Zhaoxi ; Wang, Guangcong ; Liu, Ziwei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c329t-f36d9182dafa1a87eb14246989e4936b3d3f25552c63b7a19a02fb23e3ffac483</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>3D generative model</topic><topic>Annotations</topic><topic>Cameras</topic><topic>GAN</topic><topic>Geometry</topic><topic>neural rendering</topic><topic>Parameterization</topic><topic>Random noise</topic><topic>Renderers</topic><topic>Rendering (computer graphics)</topic><topic>Representations</topic><topic>Scene generation</topic><topic>Semantics</topic><topic>Solid modeling</topic><topic>Three dimensional models</topic><topic>Three-dimensional displays</topic><topic>Training</topic><topic>unbounded scene generation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Zhaoxi</creatorcontrib><creatorcontrib>Wang, Guangcong</creatorcontrib><creatorcontrib>Liu, Ziwei</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Zhaoxi</au><au>Wang, Guangcong</au><au>Liu, Ziwei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SceneDreamer: Unbounded 3D Scene Generation From 2D Image Collections</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><date>2023-12-01</date><risdate>2023</risdate><volume>45</volume><issue>12</issue><spage>15562</spage><epage>15576</epage><pages>15562-15576</pages><issn>0162-8828</issn><eissn>2160-9292</eissn><eissn>1939-3539</eissn><coden>ITPIDJ</coden><abstract>In this work, we present SceneDreamer , an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising: 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables: 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.</abstract><cop>New York</cop><pub>IEEE</pub><pmid>37788193</pmid><doi>10.1109/TPAMI.2023.3321857</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-4220-5958</orcidid><orcidid>https://orcid.org/0000-0002-6627-814X</orcidid><orcidid>https://orcid.org/0000-0003-3998-7044</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0162-8828
ispartof	IEEE transactions on pattern analysis and machine intelligence, 2023-12, Vol.45 (12), p.15562-15576
issn	0162-8828 2160-9292 1939-3539
language	eng
recordid	cdi_ieee_primary_10269790
source	IEEE Electronic Library (IEL)
subjects	3D generative model Annotations Cameras GAN Geometry neural rendering Parameterization Random noise Renderers Rendering (computer graphics) Representations Scene generation Semantics Solid modeling Three dimensional models Three-dimensional displays Training unbounded scene generation
title	SceneDreamer: Unbounded 3D Scene Generation From 2D Image Collections
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T23%3A41%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SceneDreamer:%20Unbounded%203D%20Scene%20Generation%20From%202D%20Image%20Collections&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Chen,%20Zhaoxi&rft.date=2023-12-01&rft.volume=45&rft.issue=12&rft.spage=15562&rft.epage=15576&rft.pages=15562-15576&rft.issn=0162-8828&rft.eissn=2160-9292&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2023.3321857&rft_dat=%3Cproquest_RIE%3E2872808173%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2885654584&rft_id=info:pmid/37788193&rft_ieee_id=10269790&rfr_iscdi=true