SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout

Realistic and interactive scene simulation is a key prerequisite for autonomous vehicle (AV) development. In this work, we present SceneDiffuser, a scene-level diffusion prior designed for traffic simulation. It offers a unified framework that addresses two key stages of simulation: scene initializa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Jiang, Chiyu Max, Bai, Yijing, Cornman, Andre, Davis, Christopher, Huang, Xiukun, Jeon, Hong, Kulshrestha, Sakshum, Lambert, John, Li, Shuangyu, Zhou, Xuanyu, Fuertes, Carlos, Yuan, Chang, Tan, Mingxing, Zhou, Yin, Anguelov, Dragomir
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Jiang, Chiyu Max Bai, Yijing Cornman, Andre Davis, Christopher Huang, Xiukun Jeon, Hong Kulshrestha, Sakshum Lambert, John Li, Shuangyu Zhou, Xuanyu Fuertes, Carlos Yuan, Chang Tan, Mingxing Zhou, Yin Anguelov, Dragomir
description	Realistic and interactive scene simulation is a key prerequisite for autonomous vehicle (AV) development. In this work, we present SceneDiffuser, a scene-level diffusion prior designed for traffic simulation. It offers a unified framework that addresses two key stages of simulation: scene initialization, which involves generating initial traffic layouts, and scene rollout, which encompasses the closed-loop simulation of agent behaviors. While diffusion models have been proven effective in learning realistic and multimodal agent distributions, several challenges remain, including controllability, maintaining realism in closed-loop simulations, and ensuring inference efficiency. To address these issues, we introduce amortized diffusion for simulation. This novel diffusion denoising paradigm amortizes the computational cost of denoising over future simulation steps, significantly reducing the cost per rollout step (16x less inference steps) while also mitigating closed-loop errors. We further enhance controllability through the introduction of generalized hard constraints, a simple yet effective inference-time constraint mechanism, as well as language-based constrained scene generation via few-shot prompting of a large language model (LLM). Our investigations into model scaling reveal that increased computational resources significantly improve overall simulation realism. We demonstrate the effectiveness of our approach on the Waymo Open Sim Agents Challenge, achieving top open-loop performance and the best closed-loop performance among diffusion models.
doi_str_mv	10.48550/arxiv.2412.12129
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2412_12129</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2412_12129</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2412_121293</originalsourceid><addsrcrecordid>eNqFjrEOgjAURbs4GPUDnOwPiLRAoq6A0VVMHMkTW_OS8mpKIerXK-judHOTc5LD2FyEQbxOknAF7oFdIGMhAyGF3IzZuagUqQy1bhvltjzXGitU5DnQlaeWvLPGwMUonjnskG68wLo14NESPxB6BIOv7-2V4we3rZ-ykQbTqNlvJ2yxy0_pfjkklHeHNbhn2aeUQ0r0n3gDF64_uA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout</title><source>arXiv.org</source><creator>Jiang, Chiyu Max ; Bai, Yijing ; Cornman, Andre ; Davis, Christopher ; Huang, Xiukun ; Jeon, Hong ; Kulshrestha, Sakshum ; Lambert, John ; Li, Shuangyu ; Zhou, Xuanyu ; Fuertes, Carlos ; Yuan, Chang ; Tan, Mingxing ; Zhou, Yin ; Anguelov, Dragomir</creator><creatorcontrib>Jiang, Chiyu Max ; Bai, Yijing ; Cornman, Andre ; Davis, Christopher ; Huang, Xiukun ; Jeon, Hong ; Kulshrestha, Sakshum ; Lambert, John ; Li, Shuangyu ; Zhou, Xuanyu ; Fuertes, Carlos ; Yuan, Chang ; Tan, Mingxing ; Zhou, Yin ; Anguelov, Dragomir</creatorcontrib><description>Realistic and interactive scene simulation is a key prerequisite for autonomous vehicle (AV) development. In this work, we present SceneDiffuser, a scene-level diffusion prior designed for traffic simulation. It offers a unified framework that addresses two key stages of simulation: scene initialization, which involves generating initial traffic layouts, and scene rollout, which encompasses the closed-loop simulation of agent behaviors. While diffusion models have been proven effective in learning realistic and multimodal agent distributions, several challenges remain, including controllability, maintaining realism in closed-loop simulations, and ensuring inference efficiency. To address these issues, we introduce amortized diffusion for simulation. This novel diffusion denoising paradigm amortizes the computational cost of denoising over future simulation steps, significantly reducing the cost per rollout step (16x less inference steps) while also mitigating closed-loop errors. We further enhance controllability through the introduction of generalized hard constraints, a simple yet effective inference-time constraint mechanism, as well as language-based constrained scene generation via few-shot prompting of a large language model (LLM). Our investigations into model scaling reveal that increased computational resources significantly improve overall simulation realism. We demonstrate the effectiveness of our approach on the Waymo Open Sim Agents Challenge, achieving top open-loop performance and the best closed-loop performance among diffusion models.</description><identifier>DOI: 10.48550/arxiv.2412.12129</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Computer Vision and Pattern Recognition ; Computer Science - Learning</subject><creationdate>2024-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2412.12129$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2412.12129$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Jiang, Chiyu Max</creatorcontrib><creatorcontrib>Bai, Yijing</creatorcontrib><creatorcontrib>Cornman, Andre</creatorcontrib><creatorcontrib>Davis, Christopher</creatorcontrib><creatorcontrib>Huang, Xiukun</creatorcontrib><creatorcontrib>Jeon, Hong</creatorcontrib><creatorcontrib>Kulshrestha, Sakshum</creatorcontrib><creatorcontrib>Lambert, John</creatorcontrib><creatorcontrib>Li, Shuangyu</creatorcontrib><creatorcontrib>Zhou, Xuanyu</creatorcontrib><creatorcontrib>Fuertes, Carlos</creatorcontrib><creatorcontrib>Yuan, Chang</creatorcontrib><creatorcontrib>Tan, Mingxing</creatorcontrib><creatorcontrib>Zhou, Yin</creatorcontrib><creatorcontrib>Anguelov, Dragomir</creatorcontrib><title>SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout</title><description>Realistic and interactive scene simulation is a key prerequisite for autonomous vehicle (AV) development. In this work, we present SceneDiffuser, a scene-level diffusion prior designed for traffic simulation. It offers a unified framework that addresses two key stages of simulation: scene initialization, which involves generating initial traffic layouts, and scene rollout, which encompasses the closed-loop simulation of agent behaviors. While diffusion models have been proven effective in learning realistic and multimodal agent distributions, several challenges remain, including controllability, maintaining realism in closed-loop simulations, and ensuring inference efficiency. To address these issues, we introduce amortized diffusion for simulation. This novel diffusion denoising paradigm amortizes the computational cost of denoising over future simulation steps, significantly reducing the cost per rollout step (16x less inference steps) while also mitigating closed-loop errors. We further enhance controllability through the introduction of generalized hard constraints, a simple yet effective inference-time constraint mechanism, as well as language-based constrained scene generation via few-shot prompting of a large language model (LLM). Our investigations into model scaling reveal that increased computational resources significantly improve overall simulation realism. We demonstrate the effectiveness of our approach on the Waymo Open Sim Agents Challenge, achieving top open-loop performance and the best closed-loop performance among diffusion models.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Computer Vision and Pattern Recognition</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNqFjrEOgjAURbs4GPUDnOwPiLRAoq6A0VVMHMkTW_OS8mpKIerXK-judHOTc5LD2FyEQbxOknAF7oFdIGMhAyGF3IzZuagUqQy1bhvltjzXGitU5DnQlaeWvLPGwMUonjnskG68wLo14NESPxB6BIOv7-2V4we3rZ-ykQbTqNlvJ2yxy0_pfjkklHeHNbhn2aeUQ0r0n3gDF64_uA</recordid><startdate>20241205</startdate><enddate>20241205</enddate><creator>Jiang, Chiyu Max</creator><creator>Bai, Yijing</creator><creator>Cornman, Andre</creator><creator>Davis, Christopher</creator><creator>Huang, Xiukun</creator><creator>Jeon, Hong</creator><creator>Kulshrestha, Sakshum</creator><creator>Lambert, John</creator><creator>Li, Shuangyu</creator><creator>Zhou, Xuanyu</creator><creator>Fuertes, Carlos</creator><creator>Yuan, Chang</creator><creator>Tan, Mingxing</creator><creator>Zhou, Yin</creator><creator>Anguelov, Dragomir</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241205</creationdate><title>SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout</title><author>Jiang, Chiyu Max ; Bai, Yijing ; Cornman, Andre ; Davis, Christopher ; Huang, Xiukun ; Jeon, Hong ; Kulshrestha, Sakshum ; Lambert, John ; Li, Shuangyu ; Zhou, Xuanyu ; Fuertes, Carlos ; Yuan, Chang ; Tan, Mingxing ; Zhou, Yin ; Anguelov, Dragomir</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2412_121293</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Computer Vision and Pattern Recognition</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Jiang, Chiyu Max</creatorcontrib><creatorcontrib>Bai, Yijing</creatorcontrib><creatorcontrib>Cornman, Andre</creatorcontrib><creatorcontrib>Davis, Christopher</creatorcontrib><creatorcontrib>Huang, Xiukun</creatorcontrib><creatorcontrib>Jeon, Hong</creatorcontrib><creatorcontrib>Kulshrestha, Sakshum</creatorcontrib><creatorcontrib>Lambert, John</creatorcontrib><creatorcontrib>Li, Shuangyu</creatorcontrib><creatorcontrib>Zhou, Xuanyu</creatorcontrib><creatorcontrib>Fuertes, Carlos</creatorcontrib><creatorcontrib>Yuan, Chang</creatorcontrib><creatorcontrib>Tan, Mingxing</creatorcontrib><creatorcontrib>Zhou, Yin</creatorcontrib><creatorcontrib>Anguelov, Dragomir</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Jiang, Chiyu Max</au><au>Bai, Yijing</au><au>Cornman, Andre</au><au>Davis, Christopher</au><au>Huang, Xiukun</au><au>Jeon, Hong</au><au>Kulshrestha, Sakshum</au><au>Lambert, John</au><au>Li, Shuangyu</au><au>Zhou, Xuanyu</au><au>Fuertes, Carlos</au><au>Yuan, Chang</au><au>Tan, Mingxing</au><au>Zhou, Yin</au><au>Anguelov, Dragomir</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout</atitle><date>2024-12-05</date><risdate>2024</risdate><abstract>Realistic and interactive scene simulation is a key prerequisite for autonomous vehicle (AV) development. In this work, we present SceneDiffuser, a scene-level diffusion prior designed for traffic simulation. It offers a unified framework that addresses two key stages of simulation: scene initialization, which involves generating initial traffic layouts, and scene rollout, which encompasses the closed-loop simulation of agent behaviors. While diffusion models have been proven effective in learning realistic and multimodal agent distributions, several challenges remain, including controllability, maintaining realism in closed-loop simulations, and ensuring inference efficiency. To address these issues, we introduce amortized diffusion for simulation. This novel diffusion denoising paradigm amortizes the computational cost of denoising over future simulation steps, significantly reducing the cost per rollout step (16x less inference steps) while also mitigating closed-loop errors. We further enhance controllability through the introduction of generalized hard constraints, a simple yet effective inference-time constraint mechanism, as well as language-based constrained scene generation via few-shot prompting of a large language model (LLM). Our investigations into model scaling reveal that increased computational resources significantly improve overall simulation realism. We demonstrate the effectiveness of our approach on the Waymo Open Sim Agents Challenge, achieving top open-loop performance and the best closed-loop performance among diffusion models.</abstract><doi>10.48550/arxiv.2412.12129</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2412.12129
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2412_12129
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Computer Vision and Pattern Recognition Computer Science - Learning
title	SceneDiffuser: Efficient and Controllable Driving Simulation Initialization and Rollout
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-28T09%3A41%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SceneDiffuser:%20Efficient%20and%20Controllable%20Driving%20Simulation%20Initialization%20and%20Rollout&rft.au=Jiang,%20Chiyu%20Max&rft.date=2024-12-05&rft_id=info:doi/10.48550/arxiv.2412.12129&rft_dat=%3Carxiv_GOX%3E2412_12129%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true