SceneDreamer: Unbounded 3D Scene Generation From 2D Image Collections

In this work, we present SceneDreamer , an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principl...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2023-12, Vol.45 (12), p.15562-15576
Hauptverfasser: Chen, Zhaoxi, Wang, Guangcong, Liu, Ziwei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 15576
container_issue 12
container_start_page 15562
container_title IEEE transactions on pattern analysis and machine intelligence
container_volume 45
creator Chen, Zhaoxi
Wang, Guangcong
Liu, Ziwei
description In this work, we present SceneDreamer , an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising: 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables: 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.
doi_str_mv 10.1109/TPAMI.2023.3321857
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10269790</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10269790</ieee_id><sourcerecordid>2872808173</sourcerecordid><originalsourceid>FETCH-LOGICAL-c329t-f36d9182dafa1a87eb14246989e4936b3d3f25552c63b7a19a02fb23e3ffac483</originalsourceid><addsrcrecordid>eNpdkLtOAzEQRS0EIiHwA4hiJRqaDfbMPmy6KC8iBYFEUlve3TFKtI9gZwv-ns2jQDQzxT13NDqM3Qs-FIKr59XH6G0xBA44RAQh4_SC9UEkPFSg4JL1uUgglBJkj914v-VcRDHHa9bDNJVSKOyz6WdONU0cmYrcS7Cus6atCyoCnATHKJh3w5n9pqmDmWuqACbBojJfFIybsqT8EPhbdmVN6enuvAdsPZuuxq_h8n2-GI-WYY6g9qHFpFBCQmGsEUamlIkIokRJRZHCJMMCLcRxDHmCWWqEMhxsBkhorckjiQP2dLq7c813S36vq43PqSxNTU3rNcgUJJcixQ59_Idum9bV3XcdJeMkjmIZdRScqNw13juyeuc2lXE_WnB9kKyPkvVBsj5L7koPp9KGiP4UIFGp4vgLDDx0Pw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2885654584</pqid></control><display><type>article</type><title>SceneDreamer: Unbounded 3D Scene Generation From 2D Image Collections</title><source>IEEE Electronic Library (IEL)</source><creator>Chen, Zhaoxi ; Wang, Guangcong ; Liu, Ziwei</creator><creatorcontrib>Chen, Zhaoxi ; Wang, Guangcong ; Liu, Ziwei</creatorcontrib><description>In this work, we present SceneDreamer , an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising: 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables: 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.</description><identifier>ISSN: 0162-8828</identifier><identifier>EISSN: 2160-9292</identifier><identifier>EISSN: 1939-3539</identifier><identifier>DOI: 10.1109/TPAMI.2023.3321857</identifier><identifier>PMID: 37788193</identifier><identifier>CODEN: ITPIDJ</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>3D generative model ; Annotations ; Cameras ; GAN ; Geometry ; neural rendering ; Parameterization ; Random noise ; Renderers ; Rendering (computer graphics) ; Representations ; Scene generation ; Semantics ; Solid modeling ; Three dimensional models ; Three-dimensional displays ; Training ; unbounded scene generation</subject><ispartof>IEEE transactions on pattern analysis and machine intelligence, 2023-12, Vol.45 (12), p.15562-15576</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c329t-f36d9182dafa1a87eb14246989e4936b3d3f25552c63b7a19a02fb23e3ffac483</citedby><cites>FETCH-LOGICAL-c329t-f36d9182dafa1a87eb14246989e4936b3d3f25552c63b7a19a02fb23e3ffac483</cites><orcidid>0000-0002-4220-5958 ; 0000-0002-6627-814X ; 0000-0003-3998-7044</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10269790$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27903,27904,54736</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10269790$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chen, Zhaoxi</creatorcontrib><creatorcontrib>Wang, Guangcong</creatorcontrib><creatorcontrib>Liu, Ziwei</creatorcontrib><title>SceneDreamer: Unbounded 3D Scene Generation From 2D Image Collections</title><title>IEEE transactions on pattern analysis and machine intelligence</title><addtitle>TPAMI</addtitle><description>In this work, we present SceneDreamer , an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising: 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables: 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.</description><subject>3D generative model</subject><subject>Annotations</subject><subject>Cameras</subject><subject>GAN</subject><subject>Geometry</subject><subject>neural rendering</subject><subject>Parameterization</subject><subject>Random noise</subject><subject>Renderers</subject><subject>Rendering (computer graphics)</subject><subject>Representations</subject><subject>Scene generation</subject><subject>Semantics</subject><subject>Solid modeling</subject><subject>Three dimensional models</subject><subject>Three-dimensional displays</subject><subject>Training</subject><subject>unbounded scene generation</subject><issn>0162-8828</issn><issn>2160-9292</issn><issn>1939-3539</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkLtOAzEQRS0EIiHwA4hiJRqaDfbMPmy6KC8iBYFEUlve3TFKtI9gZwv-ns2jQDQzxT13NDqM3Qs-FIKr59XH6G0xBA44RAQh4_SC9UEkPFSg4JL1uUgglBJkj914v-VcRDHHa9bDNJVSKOyz6WdONU0cmYrcS7Cus6atCyoCnATHKJh3w5n9pqmDmWuqACbBojJfFIybsqT8EPhbdmVN6enuvAdsPZuuxq_h8n2-GI-WYY6g9qHFpFBCQmGsEUamlIkIokRJRZHCJMMCLcRxDHmCWWqEMhxsBkhorckjiQP2dLq7c813S36vq43PqSxNTU3rNcgUJJcixQ59_Idum9bV3XcdJeMkjmIZdRScqNw13juyeuc2lXE_WnB9kKyPkvVBsj5L7koPp9KGiP4UIFGp4vgLDDx0Pw</recordid><startdate>20231201</startdate><enddate>20231201</enddate><creator>Chen, Zhaoxi</creator><creator>Wang, Guangcong</creator><creator>Liu, Ziwei</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-4220-5958</orcidid><orcidid>https://orcid.org/0000-0002-6627-814X</orcidid><orcidid>https://orcid.org/0000-0003-3998-7044</orcidid></search><sort><creationdate>20231201</creationdate><title>SceneDreamer: Unbounded 3D Scene Generation From 2D Image Collections</title><author>Chen, Zhaoxi ; Wang, Guangcong ; Liu, Ziwei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c329t-f36d9182dafa1a87eb14246989e4936b3d3f25552c63b7a19a02fb23e3ffac483</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>3D generative model</topic><topic>Annotations</topic><topic>Cameras</topic><topic>GAN</topic><topic>Geometry</topic><topic>neural rendering</topic><topic>Parameterization</topic><topic>Random noise</topic><topic>Renderers</topic><topic>Rendering (computer graphics)</topic><topic>Representations</topic><topic>Scene generation</topic><topic>Semantics</topic><topic>Solid modeling</topic><topic>Three dimensional models</topic><topic>Three-dimensional displays</topic><topic>Training</topic><topic>unbounded scene generation</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Zhaoxi</creatorcontrib><creatorcontrib>Wang, Guangcong</creatorcontrib><creatorcontrib>Liu, Ziwei</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chen, Zhaoxi</au><au>Wang, Guangcong</au><au>Liu, Ziwei</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SceneDreamer: Unbounded 3D Scene Generation From 2D Image Collections</atitle><jtitle>IEEE transactions on pattern analysis and machine intelligence</jtitle><stitle>TPAMI</stitle><date>2023-12-01</date><risdate>2023</risdate><volume>45</volume><issue>12</issue><spage>15562</spage><epage>15576</epage><pages>15562-15576</pages><issn>0162-8828</issn><eissn>2160-9292</eissn><eissn>1939-3539</eissn><coden>ITPIDJ</coden><abstract>In this work, we present SceneDreamer , an unconditional generative model for unbounded 3D scenes, which synthesizes large-scale 3D landscapes from random noise. Our framework is learned from in-the-wild 2D image collections only, without any 3D annotations. At the core of SceneDreamer is a principled learning paradigm comprising: 1) an efficient yet expressive 3D scene representation, 2) a generative scene parameterization, and 3) an effective renderer that can leverage the knowledge from 2D images. Our approach begins with an efficient bird's-eye-view (BEV) representation generated from simplex noise, which includes a height field for surface elevation and a semantic field for detailed scene semantics. This BEV scene representation enables: 1) representing a 3D scene with quadratic complexity, 2) disentangled geometry and semantics, and 3) efficient training. Moreover, we propose a novel generative neural hash grid to parameterize the latent space based on 3D positions and scene semantics, aiming to encode generalizable features across various scenes. Lastly, a neural volumetric renderer, learned from 2D image collections through adversarial training, is employed to produce photorealistic images. Extensive experiments demonstrate the effectiveness of SceneDreamer and superiority over state-of-the-art methods in generating vivid yet diverse unbounded 3D worlds.</abstract><cop>New York</cop><pub>IEEE</pub><pmid>37788193</pmid><doi>10.1109/TPAMI.2023.3321857</doi><tpages>15</tpages><orcidid>https://orcid.org/0000-0002-4220-5958</orcidid><orcidid>https://orcid.org/0000-0002-6627-814X</orcidid><orcidid>https://orcid.org/0000-0003-3998-7044</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0162-8828
ispartof IEEE transactions on pattern analysis and machine intelligence, 2023-12, Vol.45 (12), p.15562-15576
issn 0162-8828
2160-9292
1939-3539
language eng
recordid cdi_ieee_primary_10269790
source IEEE Electronic Library (IEL)
subjects 3D generative model
Annotations
Cameras
GAN
Geometry
neural rendering
Parameterization
Random noise
Renderers
Rendering (computer graphics)
Representations
Scene generation
Semantics
Solid modeling
Three dimensional models
Three-dimensional displays
Training
unbounded scene generation
title SceneDreamer: Unbounded 3D Scene Generation From 2D Image Collections
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-27T23%3A41%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SceneDreamer:%20Unbounded%203D%20Scene%20Generation%20From%202D%20Image%20Collections&rft.jtitle=IEEE%20transactions%20on%20pattern%20analysis%20and%20machine%20intelligence&rft.au=Chen,%20Zhaoxi&rft.date=2023-12-01&rft.volume=45&rft.issue=12&rft.spage=15562&rft.epage=15576&rft.pages=15562-15576&rft.issn=0162-8828&rft.eissn=2160-9292&rft.coden=ITPIDJ&rft_id=info:doi/10.1109/TPAMI.2023.3321857&rft_dat=%3Cproquest_RIE%3E2872808173%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2885654584&rft_id=info:pmid/37788193&rft_ieee_id=10269790&rfr_iscdi=true