MR-Occ: Efficient Camera-LiDAR 3D Semantic Occupancy Prediction Using Hierarchical Multi-Resolution Voxel Representation
Accurate 3D perception is essential for understanding the environment in autonomous driving. Recent advancements in 3D semantic occupancy prediction have leveraged camera-LiDAR fusion to improve robustness and accuracy. However, current methods allocate computational resources uniformly across all v...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Accurate 3D perception is essential for understanding the environment in
autonomous driving. Recent advancements in 3D semantic occupancy prediction
have leveraged camera-LiDAR fusion to improve robustness and accuracy. However,
current methods allocate computational resources uniformly across all voxels,
leading to inefficiency, and they also fail to adequately address occlusions,
resulting in reduced accuracy in challenging scenarios. We propose MR-Occ, a
novel approach for camera-LiDAR fusion-based 3D semantic occupancy prediction,
addressing these challenges through three key components: Hierarchical Voxel
Feature Refinement (HVFR), Multi-scale Occupancy Decoder (MOD), and Pixel to
Voxel Fusion Network (PVF-Net). HVFR improves performance by enhancing features
for critical voxels, reducing computational cost. MOD introduces an `occluded'
class to better handle regions obscured from sensor view, improving accuracy.
PVF-Net leverages densified LiDAR features to effectively fuse camera and LiDAR
data through a deformable attention mechanism. Extensive experiments
demonstrate that MR-Occ achieves state-of-the-art performance on the
nuScenes-Occupancy dataset, surpassing previous approaches by +5.2% in IoU and
+5.3% in mIoU while using fewer parameters and FLOPs. Moreover, MR-Occ
demonstrates superior performance on the SemanticKITTI dataset, further
validating its effectiveness and generalizability across diverse 3D semantic
occupancy benchmarks. |
---|---|
DOI: | 10.48550/arxiv.2412.20480 |