I2P Registration by Learning the Underlying Alignment Feature Space from Pixel-to-Point Similarities

Estimating the relative pose between a camera and a LiDAR holds paramount importance in facilitating complex task execution within multi-agent systems. Nonetheless, current methodologies encounter two primary limitations. First, amid the cross-modal feature extraction, they typically employ separate...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on multimedia computing communications and applications 2024-12, Vol.20 (12), p.1-21, Article 388
Hauptverfasser:	Sun, Yunda, Zhang, Lin, Wang, Zhong, Chen, Yang, Zhao, Shengjie, Zhou, Yicong
Format:	Artikel
Sprache:	eng
Schlagworte:	Computing methodologies Vision for robotics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Estimating the relative pose between a camera and a LiDAR holds paramount importance in facilitating complex task execution within multi-agent systems. Nonetheless, current methodologies encounter two primary limitations. First, amid the cross-modal feature extraction, they typically employ separate modal branches to extract cross-modal features from images and point clouds. This approach results in the feature spaces of images and point clouds being misaligned, thereby reducing the robustness of establishing correspondences. Second, due to the scale differences between images and point clouds, one-to-many pixel-point correspondences are inevitably encountered, which will mislead the pose optimization. To address these challenges, we propose a framework named Image-to-Point cloud registration by learning the underlying alignment feature space from Pixel-to-Point SIMimilarities (I2P \({}_{\mathbf{ppsim}}\) ). Central to \(\text{I2P}_{\text{ppsim}}\) is a Shared Feature Alignment Module (SFAM). It is designed under on a coarse-to-fine architecture and uses a weight-sharing network to construct an alignment feature space. Benefiting from SFAM, \(\text{I2P}_{\text{ppsim}}\) can effectively identify the co-view regions between images and point clouds and establish high-reliability 2D-3D correspondences. Moreover, to mitigate the one-to-many correspondence issue, we introduce a similarity maximization strategy termed point-max. This strategy effectively filters out outliers, thereby establishing accurate 2D-3D correspondences. To evaluate the efficacy of our framework, we conduct extensive experiments on KITTI Odometry and Oxford Robotcar. The results corroborate the effectiveness of our framework in improving image-to-point cloud registration. To make our results reproducible, the source codes have been released at https://cslinzhang.github.io/I2P
ISSN:	1551-6857 1551-6865
DOI:	10.1145/3697839