Lightweight LiDAR-Camera Alignment With Homogeneous Local-Global Aware Representation

In this paper, a novel LiDAR-Camera Alignment (LCA) method using homogeneous local-global spatial aware representation is proposed. Compared with the state-of-the-art methods (e.g., LCCNet), our proposition holds 2 main superiorities. First, homogeneous multi-modality representation learned with a u...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on intelligent transportation systems 2024-11, Vol.25 (11), p.15922-15933
Hauptverfasser:	Zhu, Angfan, Xiao, Yang, Liu, Chengxin, Tan, Mingkui, Cao, Zhiguo
Format:	Artikel
Sprache:	eng
Schlagworte:	6-DOF Cameras Convolutional neural networks deep learning Feature extraction homogeneous multi-modality representation Laser radar LiDAR-camera alignment local-global spatial awareness Representation learning transformer Transformers
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, a novel LiDAR-Camera Alignment (LCA) method using homogeneous local-global spatial aware representation is proposed. Compared with the state-of-the-art methods (e.g., LCCNet), our proposition holds 2 main superiorities. First, homogeneous multi-modality representation learned with a uniform CNN model is applied along the iterative prediction stages, instead of the state-of-the-art heterogeneous counterparts extracted from the separated modality-wise CNN models within each stage. In this way, the model size can be significantly decreased (e.g., 12.39M (ours) vs. 333.75M (LCCNet)). Meanwhile, within our proposition the interaction between LiDAR and camera data is built during feature learning to better exploit the descriptive clues, which has not been well concerned by the existing approaches. Secondly, we propose to equip the learned LCA representation with local-global spatial aware capacity via encoding CNN's local convolutional features with Transformer's non-local self-attention manner. Accordingly, the local fine details and global spatial context can be jointly captured by the encoded local features. And, they will be jointly used for LCA. On the other hand, the existing methods generally choose to reveal the global spatial property via intuitively concatenating the local features. Additionally at the initial LCA stage, LiDAR is roughly aligned with camera by our pre-alignment method, according to the point distribution characteristics of its 2D projection version with the initial extrinsic parameters. Although its structure is simple, it can essentially alleviate LCA's difficulty for the consequent stages. To better optimize LCA, a novel loss function that builds the correlation between translation and rotation loss items is also proposed. The experiments on KITTI data verifies the superiority of our proposition both on effectiveness and efficiency. The source code will be released at https://github.com/Zaf233/Light-weight-LCA upon acceptance.
ISSN:	1524-9050 1558-0016
DOI:	10.1109/TITS.2024.3409397