UP-Net: unique keyPoint description and detection net

Many computer vision tasks, such as simultaneous localization and mapping, visual localization, image retrieval, pose estimation, structure-from-motion, rely on the keypoints matching relationship between image pairs. Recently, jointly learned keypoint descriptor and detector nets with simple struct...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Machine vision and applications 2022, Vol.33 (1), Article 13
Hauptverfasser:	Yang, Ning, Han, Yunlong, Fang, Jun, Zhong, Weijun, Xu, Anlin
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Communications Engineering Computer Science Computer vision Forecasting Image management Image Processing and Computer Vision Image reconstruction Image retrieval Localization Matching Networks Original Paper Pattern Recognition Pose estimation Texture Vision systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Many computer vision tasks, such as simultaneous localization and mapping, visual localization, image retrieval, pose estimation, structure-from-motion, rely on the keypoints matching relationship between image pairs. Recently, jointly learned keypoint descriptor and detector nets with simple structure have shown highly competitive performance. However, most of them have two limitations: 1) The positioning accuracy of detected keypoints is poor, which has a negative impact on many applications; 2) by only emphasizing repeatability in keypoint detection, mismatches may occur easily in the texture areas. In this work, we make two enhancements on D2-Net to address these two problems: Firstly, feature fusion is used to enrich feature information of different levels and solve the problem of positioning accuracy of keypoints; secondly, the uniqueness index of keypoints is added in the keypoint detection, and keypoints of the repeated pattern in the texture region are eliminated, which makes keypoints more effective and accurate. Furthermore, we use homography to build the correspondence between image pairs and use it to achieve unsupervised training. Our method achieves leading performance on the HPatches dataset for image matching, especially on its illumination sequences, with a 5 % improvement over the state-of-the-art ASLFeat method at a projection error threshold of 10 px. Meanwhile, our keypoint positioning accuracy is twice than that of D2-Net with the strict projection error threshold. It also exhibits competitive performance in 3D reconstruction and visual localization experiments.
ISSN:	0932-8092 1432-1769
DOI:	10.1007/s00138-021-01266-7