GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction
3D occupancy prediction is important for autonomous driving due to its comprehensive perception of the surroundings. To incorporate sequential inputs, most existing methods fuse representations from previous frames to infer the current 3D occupancy. However, they fail to consider the continuity of d...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Zuo, Sicheng Zheng, Wenzhao Huang, Yuanhui Zhou, Jie Lu, Jiwen |
description | 3D occupancy prediction is important for autonomous driving due to its
comprehensive perception of the surroundings. To incorporate sequential inputs,
most existing methods fuse representations from previous frames to infer the
current 3D occupancy. However, they fail to consider the continuity of driving
scenarios and ignore the strong prior provided by the evolution of 3D scenes
(e.g., only dynamic objects move). In this paper, we propose a
world-model-based framework to exploit the scene evolution for perception. We
reformulate 3D occupancy prediction as a 4D occupancy forecasting problem
conditioned on the current sensor input. We decompose the scene evolution into
three factors: 1) ego motion alignment of static scenes; 2) local movements of
dynamic objects; and 3) completion of newly-observed scenes. We then employ a
Gaussian world model (GaussianWorld) to explicitly exploit these priors and
infer the scene evolution in the 3D Gaussian space considering the current RGB
observation. We evaluate the effectiveness of our framework on the widely used
nuScenes dataset. Our GaussianWorld improves the performance of the
single-frame counterpart by over 2% in mIoU without introducing additional
computations. Code: https://github.com/zuosc19/GaussianWorld. |
doi_str_mv | 10.48550/arxiv.2412.10373 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_10373</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_10373</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_103733</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0MDY35mTwdE8sLS7OTMwLzy_KSbFSgHEVwHwF3_yU1ByFtPwiheCSotTE3My8dAVjFwX_5OTSgsS85EqFgKLUlMzkksz8PB4G1rTEnOJUXijNzSDv5hri7KELtjS-oCgzN7GoMh5keTzYcmPCKgD9kjmR</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction</title><source>arXiv.org</source><creator>Zuo, Sicheng ; Zheng, Wenzhao ; Huang, Yuanhui ; Zhou, Jie ; Lu, Jiwen</creator><creatorcontrib>Zuo, Sicheng ; Zheng, Wenzhao ; Huang, Yuanhui ; Zhou, Jie ; Lu, Jiwen</creatorcontrib><description>3D occupancy prediction is important for autonomous driving due to its
comprehensive perception of the surroundings. To incorporate sequential inputs,
most existing methods fuse representations from previous frames to infer the
current 3D occupancy. However, they fail to consider the continuity of driving
scenarios and ignore the strong prior provided by the evolution of 3D scenes
(e.g., only dynamic objects move). In this paper, we propose a
world-model-based framework to exploit the scene evolution for perception. We
reformulate 3D occupancy prediction as a 4D occupancy forecasting problem
conditioned on the current sensor input. We decompose the scene evolution into
three factors: 1) ego motion alignment of static scenes; 2) local movements of
dynamic objects; and 3) completion of newly-observed scenes. We then employ a
Gaussian world model (GaussianWorld) to explicitly exploit these priors and
infer the scene evolution in the 3D Gaussian space considering the current RGB
observation. We evaluate the effectiveness of our framework on the widely used
nuScenes dataset. Our GaussianWorld improves the performance of the
single-frame counterpart by over 2% in mIoU without introducing additional
computations. Code: https://github.com/zuosc19/GaussianWorld.</description><identifier>DOI: 10.48550/arxiv.2412.10373</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2024-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.10373$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.10373$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zuo, Sicheng</creatorcontrib><creatorcontrib>Zheng, Wenzhao</creatorcontrib><creatorcontrib>Huang, Yuanhui</creatorcontrib><creatorcontrib>Zhou, Jie</creatorcontrib><creatorcontrib>Lu, Jiwen</creatorcontrib><title>GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction</title><description>3D occupancy prediction is important for autonomous driving due to its
comprehensive perception of the surroundings. To incorporate sequential inputs,
most existing methods fuse representations from previous frames to infer the
current 3D occupancy. However, they fail to consider the continuity of driving
scenarios and ignore the strong prior provided by the evolution of 3D scenes
(e.g., only dynamic objects move). In this paper, we propose a
world-model-based framework to exploit the scene evolution for perception. We
reformulate 3D occupancy prediction as a 4D occupancy forecasting problem
conditioned on the current sensor input. We decompose the scene evolution into
three factors: 1) ego motion alignment of static scenes; 2) local movements of
dynamic objects; and 3) completion of newly-observed scenes. We then employ a
Gaussian world model (GaussianWorld) to explicitly exploit these priors and
infer the scene evolution in the 3D Gaussian space considering the current RGB
observation. We evaluate the effectiveness of our framework on the widely used
nuScenes dataset. Our GaussianWorld improves the performance of the
single-frame counterpart by over 2% in mIoU without introducing additional
computations. Code: https://github.com/zuosc19/GaussianWorld.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jM0MDY35mTwdE8sLS7OTMwLzy_KSbFSgHEVwHwF3_yU1ByFtPwiheCSotTE3My8dAVjFwX_5OTSgsS85EqFgKLUlMzkksz8PB4G1rTEnOJUXijNzSDv5hri7KELtjS-oCgzN7GoMh5keTzYcmPCKgD9kjmR</recordid><startdate>20241213</startdate><enddate>20241213</enddate><creator>Zuo, Sicheng</creator><creator>Zheng, Wenzhao</creator><creator>Huang, Yuanhui</creator><creator>Zhou, Jie</creator><creator>Lu, Jiwen</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241213</creationdate><title>GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction</title><author>Zuo, Sicheng ; Zheng, Wenzhao ; Huang, Yuanhui ; Zhou, Jie ; Lu, Jiwen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_103733</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Zuo, Sicheng</creatorcontrib><creatorcontrib>Zheng, Wenzhao</creatorcontrib><creatorcontrib>Huang, Yuanhui</creatorcontrib><creatorcontrib>Zhou, Jie</creatorcontrib><creatorcontrib>Lu, Jiwen</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zuo, Sicheng</au><au>Zheng, Wenzhao</au><au>Huang, Yuanhui</au><au>Zhou, Jie</au><au>Lu, Jiwen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction</atitle><date>2024-12-13</date><risdate>2024</risdate><abstract>3D occupancy prediction is important for autonomous driving due to its
comprehensive perception of the surroundings. To incorporate sequential inputs,
most existing methods fuse representations from previous frames to infer the
current 3D occupancy. However, they fail to consider the continuity of driving
scenarios and ignore the strong prior provided by the evolution of 3D scenes
(e.g., only dynamic objects move). In this paper, we propose a
world-model-based framework to exploit the scene evolution for perception. We
reformulate 3D occupancy prediction as a 4D occupancy forecasting problem
conditioned on the current sensor input. We decompose the scene evolution into
three factors: 1) ego motion alignment of static scenes; 2) local movements of
dynamic objects; and 3) completion of newly-observed scenes. We then employ a
Gaussian world model (GaussianWorld) to explicitly exploit these priors and
infer the scene evolution in the 3D Gaussian space considering the current RGB
observation. We evaluate the effectiveness of our framework on the widely used
nuScenes dataset. Our GaussianWorld improves the performance of the
single-frame counterpart by over 2% in mIoU without introducing additional
computations. Code: https://github.com/zuosc19/GaussianWorld.</abstract><doi>10.48550/arxiv.2412.10373</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2412.10373 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2412_10373 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning |
title | GaussianWorld: Gaussian World Model for Streaming 3D Occupancy Prediction |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-21T08%3A21%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=GaussianWorld:%20Gaussian%20World%20Model%20for%20Streaming%203D%20Occupancy%20Prediction&rft.au=Zuo,%20Sicheng&rft.date=2024-12-13&rft_id=info:doi/10.48550/arxiv.2412.10373&rft_dat=%3Carxiv_GOX%3E2412_10373%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |