UniScene: Unified Occupancy-centric Driving Scene Generation
Generating high-fidelity, controllable, and annotated training data is critical for autonomous driving. Existing methods typically generate a single data form directly from a coarse scene layout, which not only fails to output rich data forms required for diverse downstream tasks but also struggles...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Generating high-fidelity, controllable, and annotated training data is
critical for autonomous driving. Existing methods typically generate a single
data form directly from a coarse scene layout, which not only fails to output
rich data forms required for diverse downstream tasks but also struggles to
model the direct layout-to-data distribution. In this paper, we introduce
UniScene, the first unified framework for generating three key data forms -
semantic occupancy, video, and LiDAR - in driving scenes. UniScene employs a
progressive generation process that decomposes the complex task of scene
generation into two hierarchical steps: (a) first generating semantic occupancy
from a customized scene layout as a meta scene representation rich in both
semantic and geometric information, and then (b) conditioned on occupancy,
generating video and LiDAR data, respectively, with two novel transfer
strategies of Gaussian-based Joint Rendering and Prior-guided Sparse Modeling.
This occupancy-centric approach reduces the generation burden, especially for
intricate scenes, while providing detailed intermediate representations for the
subsequent generation stages. Extensive experiments demonstrate that UniScene
outperforms previous SOTAs in the occupancy, video, and LiDAR generation, which
also indeed benefits downstream driving tasks. |
---|---|
DOI: | 10.48550/arxiv.2412.05435 |