Scene Coordinate Regression Network With Global Context-Guided Spatial Feature Transformation for Visual Relocalization

Among visual relocalization from a single RGB image, the scene coordinate regression (SCoRe) based on convolutional neural network (CNN) becomes prevailing, however, it is insufficient to extract invariant features under different viewpoints due to fixed geometric structures of CNN. In this letter,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2021-07, Vol.6 (3), p.5737-5744
Hauptverfasser:	Guan, Peiyu, Cao, Zhiqiang, Yu, Junzhi, Zhou, Chao, Tan, Min
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Cameras Context Feature extraction Feature maps global context Invariants Kernel Parameters Predictions Robustness Scene coordinate regression network spatial feature transformation Task analysis Three-dimensional displays Transformations visual relocalization Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Among visual relocalization from a single RGB image, the scene coordinate regression (SCoRe) based on convolutional neural network (CNN) becomes prevailing, however, it is insufficient to extract invariant features under different viewpoints due to fixed geometric structures of CNN. In this letter, we propose a global context-guided spatial feature transformation (SFT) network to learn invariant feature representation for robustness against viewpoint changes. Specifically, global feature extracted from source feature map is regarded as a dynamic convolutional kernel, which is convolved with source feature map for the prediction of transformation parameters. The predicted parameters are used to transform features of multiple viewpoints to a canonical space with the constraint of maximum likelihood-derived loss, and thus viewpoint invariance is achieved. CoordConv is also employed to further improve the discrimination of features on texture-less or repetitive zones. The proposed SFT network can be easily incorporated into the general SCoRe network. To our best knowledge, features are first decoupled from viewpoints explicitly in SCoRe network by the spatial feature transformation network, which achieves a stable and accurate visual relocalization. The experimental results demonstrate the effectiveness of the proposed method in terms of accuracy and efficiency.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2021.3082473