Deep Height Decoupling for Precise Vision-based 3D Occupancy Prediction
The task of vision-based 3D occupancy prediction aims to reconstruct 3D geometry and estimate its semantic classes from 2D color images, where the 2D-to-3D view transformation is an indispensable step. Most previous methods conduct forward projection, such as BEVPooling and VoxelPooling, both of whi...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The task of vision-based 3D occupancy prediction aims to reconstruct 3D
geometry and estimate its semantic classes from 2D color images, where the
2D-to-3D view transformation is an indispensable step. Most previous methods
conduct forward projection, such as BEVPooling and VoxelPooling, both of which
map the 2D image features into 3D grids. However, the current grid representing
features within a certain height range usually introduces many confusing
features that belong to other height ranges. To address this challenge, we
present Deep Height Decoupling (DHD), a novel framework that incorporates
explicit height prior to filter out the confusing features. Specifically, DHD
first predicts height maps via explicit supervision. Based on the height
distribution statistics, DHD designs Mask Guided Height Sampling (MGHS) to
adaptively decouple the height map into multiple binary masks. MGHS projects
the 2D image features into multiple subspaces, where each grid contains
features within reasonable height ranges. Finally, a Synergistic Feature
Aggregation (SFA) module is deployed to enhance the feature representation
through channel and spatial affinities, enabling further occupancy refinement.
On the popular Occ3D-nuScenes benchmark, our method achieves state-of-the-art
performance even with minimal input frames. Code is available at
https://github.com/yanzq95/DHD. |
---|---|
DOI: | 10.48550/arxiv.2409.07972 |