UniScene: Unified Occupancy-centric Driving Scene Generation

Generating high-fidelity, controllable, and annotated training data is critical for autonomous driving. Existing methods typically generate a single data form directly from a coarse scene layout, which not only fails to output rich data forms required for diverse downstream tasks but also struggles...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Li, Bohan, Guo, Jiazhe, Liu, Hongsi, Zou, Yingshuang, Ding, Yikang, Chen, Xiwu, Zhu, Hu, Tan, Feiyang, Zhang, Chi, Wang, Tiancai, Zhou, Shuchang, Zhang, Li, Qi, Xiaojuan, Zhao, Hao, Yang, Mu, Zeng, Wenjun, Jin, Xin
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Li, Bohan Guo, Jiazhe Liu, Hongsi Zou, Yingshuang Ding, Yikang Chen, Xiwu Zhu, Hu Tan, Feiyang Zhang, Chi Wang, Tiancai Zhou, Shuchang Zhang, Li Qi, Xiaojuan Zhao, Hao Yang, Mu Zeng, Wenjun Jin, Xin
description	Generating high-fidelity, controllable, and annotated training data is critical for autonomous driving. Existing methods typically generate a single data form directly from a coarse scene layout, which not only fails to output rich data forms required for diverse downstream tasks but also struggles to model the direct layout-to-data distribution. In this paper, we introduce UniScene, the first unified framework for generating three key data forms - semantic occupancy, video, and LiDAR - in driving scenes. UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling. This occupancy-centric approach reduces the generation burden, especially for intricate scenes, while providing detailed intermediate representations for the subsequent generation stages. Extensive experiments demonstrate that UniScene outperforms previous SOTAs in the occupancy, video, and LiDAR generation, which also indeed benefits downstream driving tasks.
doi_str_mv	10.48550/arxiv.2412.05435
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_05435</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_05435</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_054353</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jMwNTE25WSwCc3LDE5OzUu1UgCy0jJTUxT8k5NLCxLzkit1geIlRZnJCi5FmWWZeekKYIUK7kCiKLEkMz-Ph4E1LTGnOJUXSnMzyLu5hjh76ILtiS8oysxNLKqMB9kXD7bPmLAKAOH5NP0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>UniScene: Unified Occupancy-centric Driving Scene Generation</title><source>arXiv.org</source><creator>Li, Bohan ; Guo, Jiazhe ; Liu, Hongsi ; Zou, Yingshuang ; Ding, Yikang ; Chen, Xiwu ; Zhu, Hu ; Tan, Feiyang ; Zhang, Chi ; Wang, Tiancai ; Zhou, Shuchang ; Zhang, Li ; Qi, Xiaojuan ; Zhao, Hao ; Yang, Mu ; Zeng, Wenjun ; Jin, Xin</creator><creatorcontrib>Li, Bohan ; Guo, Jiazhe ; Liu, Hongsi ; Zou, Yingshuang ; Ding, Yikang ; Chen, Xiwu ; Zhu, Hu ; Tan, Feiyang ; Zhang, Chi ; Wang, Tiancai ; Zhou, Shuchang ; Zhang, Li ; Qi, Xiaojuan ; Zhao, Hao ; Yang, Mu ; Zeng, Wenjun ; Jin, Xin</creatorcontrib><description>Generating high-fidelity, controllable, and annotated training data is critical for autonomous driving. Existing methods typically generate a single data form directly from a coarse scene layout, which not only fails to output rich data forms required for diverse downstream tasks but also struggles to model the direct layout-to-data distribution. In this paper, we introduce UniScene, the first unified framework for generating three key data forms - semantic occupancy, video, and LiDAR - in driving scenes. UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling. This occupancy-centric approach reduces the generation burden, especially for intricate scenes, while providing detailed intermediate representations for the subsequent generation stages. Extensive experiments demonstrate that UniScene outperforms previous SOTAs in the occupancy, video, and LiDAR generation, which also indeed benefits downstream driving tasks.</description><identifier>DOI: 10.48550/arxiv.2412.05435</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2024-12</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.05435$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.05435$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Li, Bohan</creatorcontrib><creatorcontrib>Guo, Jiazhe</creatorcontrib><creatorcontrib>Liu, Hongsi</creatorcontrib><creatorcontrib>Zou, Yingshuang</creatorcontrib><creatorcontrib>Ding, Yikang</creatorcontrib><creatorcontrib>Chen, Xiwu</creatorcontrib><creatorcontrib>Zhu, Hu</creatorcontrib><creatorcontrib>Tan, Feiyang</creatorcontrib><creatorcontrib>Zhang, Chi</creatorcontrib><creatorcontrib>Wang, Tiancai</creatorcontrib><creatorcontrib>Zhou, Shuchang</creatorcontrib><creatorcontrib>Zhang, Li</creatorcontrib><creatorcontrib>Qi, Xiaojuan</creatorcontrib><creatorcontrib>Zhao, Hao</creatorcontrib><creatorcontrib>Yang, Mu</creatorcontrib><creatorcontrib>Zeng, Wenjun</creatorcontrib><creatorcontrib>Jin, Xin</creatorcontrib><title>UniScene: Unified Occupancy-centric Driving Scene Generation</title><description>Generating high-fidelity, controllable, and annotated training data is critical for autonomous driving. Existing methods typically generate a single data form directly from a coarse scene layout, which not only fails to output rich data forms required for diverse downstream tasks but also struggles to model the direct layout-to-data distribution. In this paper, we introduce UniScene, the first unified framework for generating three key data forms - semantic occupancy, video, and LiDAR - in driving scenes. UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling. This occupancy-centric approach reduces the generation burden, especially for intricate scenes, while providing detailed intermediate representations for the subsequent generation stages. Extensive experiments demonstrate that UniScene outperforms previous SOTAs in the occupancy, video, and LiDAR generation, which also indeed benefits downstream driving tasks.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMjE00jMwNTE25WSwCc3LDE5OzUu1UgCy0jJTUxT8k5NLCxLzkit1geIlRZnJCi5FmWWZeekKYIUK7kCiKLEkMz-Ph4E1LTGnOJUXSnMzyLu5hjh76ILtiS8oysxNLKqMB9kXD7bPmLAKAOH5NP0</recordid><startdate>20241206</startdate><enddate>20241206</enddate><creator>Li, Bohan</creator><creator>Guo, Jiazhe</creator><creator>Liu, Hongsi</creator><creator>Zou, Yingshuang</creator><creator>Ding, Yikang</creator><creator>Chen, Xiwu</creator><creator>Zhu, Hu</creator><creator>Tan, Feiyang</creator><creator>Zhang, Chi</creator><creator>Wang, Tiancai</creator><creator>Zhou, Shuchang</creator><creator>Zhang, Li</creator><creator>Qi, Xiaojuan</creator><creator>Zhao, Hao</creator><creator>Yang, Mu</creator><creator>Zeng, Wenjun</creator><creator>Jin, Xin</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241206</creationdate><title>UniScene: Unified Occupancy-centric Driving Scene Generation</title><author>Li, Bohan ; Guo, Jiazhe ; Liu, Hongsi ; Zou, Yingshuang ; Ding, Yikang ; Chen, Xiwu ; Zhu, Hu ; Tan, Feiyang ; Zhang, Chi ; Wang, Tiancai ; Zhou, Shuchang ; Zhang, Li ; Qi, Xiaojuan ; Zhao, Hao ; Yang, Mu ; Zeng, Wenjun ; Jin, Xin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_054353</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Li, Bohan</creatorcontrib><creatorcontrib>Guo, Jiazhe</creatorcontrib><creatorcontrib>Liu, Hongsi</creatorcontrib><creatorcontrib>Zou, Yingshuang</creatorcontrib><creatorcontrib>Ding, Yikang</creatorcontrib><creatorcontrib>Chen, Xiwu</creatorcontrib><creatorcontrib>Zhu, Hu</creatorcontrib><creatorcontrib>Tan, Feiyang</creatorcontrib><creatorcontrib>Zhang, Chi</creatorcontrib><creatorcontrib>Wang, Tiancai</creatorcontrib><creatorcontrib>Zhou, Shuchang</creatorcontrib><creatorcontrib>Zhang, Li</creatorcontrib><creatorcontrib>Qi, Xiaojuan</creatorcontrib><creatorcontrib>Zhao, Hao</creatorcontrib><creatorcontrib>Yang, Mu</creatorcontrib><creatorcontrib>Zeng, Wenjun</creatorcontrib><creatorcontrib>Jin, Xin</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Li, Bohan</au><au>Guo, Jiazhe</au><au>Liu, Hongsi</au><au>Zou, Yingshuang</au><au>Ding, Yikang</au><au>Chen, Xiwu</au><au>Zhu, Hu</au><au>Tan, Feiyang</au><au>Zhang, Chi</au><au>Wang, Tiancai</au><au>Zhou, Shuchang</au><au>Zhang, Li</au><au>Qi, Xiaojuan</au><au>Zhao, Hao</au><au>Yang, Mu</au><au>Zeng, Wenjun</au><au>Jin, Xin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>UniScene: Unified Occupancy-centric Driving Scene Generation</atitle><date>2024-12-06</date><risdate>2024</risdate><abstract>Generating high-fidelity, controllable, and annotated training data is critical for autonomous driving. Existing methods typically generate a single data form directly from a coarse scene layout, which not only fails to output rich data forms required for diverse downstream tasks but also struggles to model the direct layout-to-data distribution. In this paper, we introduce UniScene, the first unified framework for generating three key data forms - semantic occupancy, video, and LiDAR - in driving scenes. UniScene employs a progressive generation process that decomposes the complex task of scene generation into two hierarchical steps: (a) first generating semantic occupancy from a customized scene layout as a meta scene representation rich in both semantic and geometric information, and then (b) conditioned on occupancy, generating video and LiDAR data, respectively, with two novel transfer strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling. This occupancy-centric approach reduces the generation burden, especially for intricate scenes, while providing detailed intermediate representations for the subsequent generation stages. Extensive experiments demonstrate that UniScene outperforms previous SOTAs in the occupancy, video, and LiDAR generation, which also indeed benefits downstream driving tasks.</abstract><doi>10.48550/arxiv.2412.05435</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2412.05435
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2412_05435
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	UniScene: Unified Occupancy-centric Driving Scene Generation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-14T00%3A43%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=UniScene:%20Unified%20Occupancy-centric%20Driving%20Scene%20Generation&rft.au=Li,%20Bohan&rft.date=2024-12-06&rft_id=info:doi/10.48550/arxiv.2412.05435&rft_dat=%3Carxiv_GOX%3E2412_05435%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true