SimGen: Simulator-conditioned Driving Scene Generation
Controllable synthetic data generation can substantially lower the annotation cost of training data. Prior works use diffusion models to generate driving images conditioned on the 3D object layout. However, those models are trained on small-scale datasets like nuScenes, which lack appearance and lay...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Controllable synthetic data generation can substantially lower the annotation
cost of training data. Prior works use diffusion models to generate driving
images conditioned on the 3D object layout. However, those models are trained
on small-scale datasets like nuScenes, which lack appearance and layout
diversity. Moreover, overfitting often happens, where the trained models can
only generate images based on the layout data from the validation set of the
same dataset. In this work, we introduce a simulator-conditioned scene
generation framework called SimGen that can learn to generate diverse driving
scenes by mixing data from the simulator and the real world. It uses a novel
cascade diffusion pipeline to address challenging sim-to-real gaps and
multi-condition conflicts. A driving video dataset DIVA is collected to enhance
the generative diversity of SimGen, which contains over 147.5 hours of
real-world driving videos from 73 locations worldwide and simulated driving
data from the MetaDrive simulator. SimGen achieves superior generation quality
and diversity while preserving controllability based on the text prompt and the
layout pulled from a simulator. We further demonstrate the improvements brought
by SimGen for synthetic data augmentation on the BEV detection and segmentation
task and showcase its capability in safety-critical data generation. |
---|---|
DOI: | 10.48550/arxiv.2406.09386 |