Ground Camera Image and Large-Scale 3-D Image-Based Point Cloud Registration Based on Learning Domain Invariant Feature Descriptors

Multisource data are captured from different sensors or generated with different generation mechanisms. Ground camera images (images taken from ground-based camera) and rendered images (synthesized by the position information from 3-D image-based point cloud) are different-source geospatial data, ca...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE journal of selected topics in applied earth observations and remote sensing 2021, Vol.14, p.997-1009
Hauptverfasser: Liu, Weiquan, Lai, Baiqi, Wang, Cheng, Cai, Guorong, Su, Yanfei, Bian, Xuesheng, Li, Yongchuan, Chen, Shuting, Li, Jonathan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Multisource data are captured from different sensors or generated with different generation mechanisms. Ground camera images (images taken from ground-based camera) and rendered images (synthesized by the position information from 3-D image-based point cloud) are different-source geospatial data, called cross-domain images. Particularly, in outdoor environments, the registration relationship between the above cross-domain images is available to establish the spatial relationship between 2-D and 3-D space, which is an indirect solution for virtual-real registration of augmented reality (AR). However, the traditional handcrafted feature descriptors cannot match the above cross-domain images because of the low quality of rendered images and the domain gap between cross-domain images. In this article, inspired by the success achieved by deep learning in computer vision, we first propose an end-to-end network, DIFD-Net, to learn domain invariant feature descriptors (DIFDs) for cross-domain image patches. The DIFDs are used for cross-domain image patch retrieval to the registration of ground camera and rendered images. Second, we construct a domain-kept consistent loss function, which balances the feature descriptors for narrowing the gap in different domains, to optimize DIFD-Net. Specially, the negative samples are generated from positive during training, and the introduced constraint of intermediate feature maps increases extra supervision information to learn feature descriptors. Finally, experiments show the superiority of DIFDs for the retrieval of cross-domain image patches, which achieves state-of-the-art retrieval performance. Additionally, we use DIFDs to match ground camera images and rendered images, and verify the feasibility of the derived AR virtual-real registration in open outdoor environments.
ISSN:1939-1404
2151-1535
DOI:10.1109/JSTARS.2020.3035359