HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
Generative models have significantly improved the generation and prediction quality on either camera images or LiDAR point clouds for autonomous driving. However, a real-world autonomous driving system uses multiple kinds of input modality, usually cameras and LiDARs, where they contain complementar...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Generative models have significantly improved the generation and prediction
quality on either camera images or LiDAR point clouds for autonomous driving.
However, a real-world autonomous driving system uses multiple kinds of input
modality, usually cameras and LiDARs, where they contain complementary
information for generation, while existing generation methods ignore this
crucial feature, resulting in the generated results only covering separate 2D
or 3D information. In order to fill the gap in 2D-3D multi-modal joint
generation for autonomous driving, in this paper, we propose our framework,
\emph{HoloDrive}, to jointly generate the camera images and LiDAR point clouds.
We employ BEV-to-Camera and Camera-to-BEV transform modules between
heterogeneous generative models, and introduce a depth prediction branch in the
2D generative model to disambiguate the un-projecting from image space to BEV
space, then extend the method to predict the future by adding temporal
structure and carefully designed progressive training. Further, we conduct
experiments on single frame generation and world model benchmarks, and
demonstrate our method leads to significant performance gains over SOTA methods
in terms of generation metrics. |
---|---|
DOI: | 10.48550/arxiv.2412.01407 |