DVLO: Deep Visual-LiDAR Odometry with Local-to-Global Feature Fusion and Bi-Directional Structure Alignment
Information inside visual and LiDAR data is well complementary derived from the fine-grained texture of images and massive geometric information in point clouds. However, it remains challenging to explore effective visual-LiDAR fusion, mainly due to the intrinsic data structure inconsistency between...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Information inside visual and LiDAR data is well complementary derived from
the fine-grained texture of images and massive geometric information in point
clouds. However, it remains challenging to explore effective visual-LiDAR
fusion, mainly due to the intrinsic data structure inconsistency between two
modalities: Image pixels are regular and dense, but LiDAR points are unordered
and sparse. To address the problem, we propose a local-to-global fusion network
(DVLO) with bi-directional structure alignment. To obtain locally fused
features, we project points onto the image plane as cluster centers and cluster
image pixels around each center. Image pixels are pre-organized as pseudo
points for image-to-point structure alignment. Then, we convert points to
pseudo images by cylindrical projection (point-to-image structure alignment)
and perform adaptive global feature fusion between point features and local
fused features. Our method achieves state-of-the-art performance on KITTI
odometry and FlyingThings3D scene flow datasets compared to both single-modal
and multi-modal methods. Codes are released at https://github.com/IRMVLab/DVLO. |
---|---|
DOI: | 10.48550/arxiv.2403.18274 |