Cross domain 2D-3D descriptor matching for unconstrained 6-DOF pose estimation

•A novel concept and framework is introduced to directly match 2D and 3D features.•The framework is used for 6-DOF pose estimation, and camera and visual localization.•Current techniques usually rely on Structure from Motion models for localization.•Proposed framework can be used for any type of 3D...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Pattern recognition 2023-10, Vol.142, p.109655, Article 109655
Hauptverfasser:	Nadeem, Uzair, Bennamoun, Mohammed, Togneri, Roberto, Sohel, Ferdous, Miri Rekavandi, Aref, Boussaid, Farid
Format:	Artikel
Sprache:	eng
Schlagworte:	2D-3D Matching 6-DOF Pose estimation Camera localization Cross-Domain feature matching Image localization Visual localization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•A novel concept and framework is introduced to directly match 2D and 3D features.•The framework is used for 6-DOF pose estimation, and camera and visual localization.•Current techniques usually rely on Structure from Motion models for localization.•Proposed framework can be used for any type of 3D models and point clouds.•The method outperforms state-of-the-art techniques on benchmark datasets. This paper presents a novel approach for cross-domain descriptor matching between 2D and 3D modalities. The 2D-3D matching is applied to localize 2D images in 3D point clouds. Direct cross-domain matching allows our technique to localize images in any type of 3D point cloud without any constraints on the nature or mechanism by which it is obtained. We propose a learning based framework, called Desc-Matcher, to directly match features between the two modalities. A dataset of 2D and 3D features with corresponding locations in images and point clouds is generated to train the Desc-Matcher. To estimate the pose of an image in any 3D cloud, keypoints and feature descriptors are extracted from the query image and the point cloud. The trained Desc-Matcher is then used to match the features from the image and the point cloud. A robust pose estimator is used to predict the location and orientation of the query image from the corresponding positions of the matched 2D and 3D features. We carried out an extensive evaluation of the proposed method for indoor and outdoor scenarios and with different types of point clouds to verify the feasibility of our approach. Experimental results show that the proposed approach can reliably estimate the 6-DOF poses of query cameras in any type of 3D point cloud with high precision. We achieved average median errors of 1.09cm/0.27∘ and 19cm/0.39∘ on the Stanford and Cambridge datasets, respectively.
ISSN:	0031-3203 1873-5142
DOI:	10.1016/j.patcog.2023.109655