It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement
Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on geoscience and remote sensing 2022, Vol.60, p.1-13 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations in a single step for this task, in this article, we instead advocate endowing the network with the capability of progressive self-correcting. Toward this target, we propose a novel step-adaptive iterative refinement network (SIRNet), which decomposes the complex learning process into several refinement steps while adapting the refinement steps specifically for each input. Specifically, the SIRNet takes the output of the backbone as a rough network prediction and iteratively refines it via an iterative refinement module (IRM). The IRM cascades several refinement blocks sharing the same structure for progressive self-correcting. For each refinement block, the goal is to improve the output of the previous refinement block under the guidance of height-wise context. In this way, the IRM is capable of improving the rough network prediction step by step, and the refined features are increasingly focused on more discriminative scene regions as they are iteratively refined. In addition, considering different characteristics of input images, we devise an adaptive step estimation (ASE) mechanism, which enables our SIRNet to adapt the number of refinement steps to each input automatically. Concretely, the ASE is performed by comparing features at adjacent refinement steps, estimating whether the next step brings improvements, and finally making a halting decision at each refinement step. With the ASE, our SIRNet becomes a dynamic architecture that considers different characteristics of the inputs when performing the iterative refinement. Extensive experiments demonstrate that our SIRNet performs favorably against the state-of-the-art methods on the CVUSA and the CVACT datasets. Furthermore, quantitative and qualitative experimental results demonstrate our approach's wide applicability, impressive generalization ability, and robustness. |
---|---|
ISSN: | 0196-2892 1558-0644 |
DOI: | 10.1109/TGRS.2022.3210195 |