It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement

Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on geoscience and remote sensing 2022, Vol.60, p.1-13
Hauptverfasser: Lu, Xiufan, Luo, Siqi, Zhu, Yingying
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 13
container_issue
container_start_page 1
container_title IEEE transactions on geoscience and remote sensing
container_volume 60
creator Lu, Xiufan
Luo, Siqi
Zhu, Yingying
description Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations in a single step for this task, in this article, we instead advocate endowing the network with the capability of progressive self-correcting. Toward this target, we propose a novel step-adaptive iterative refinement network (SIRNet), which decomposes the complex learning process into several refinement steps while adapting the refinement steps specifically for each input. Specifically, the SIRNet takes the output of the backbone as a rough network prediction and iteratively refines it via an iterative refinement module (IRM). The IRM cascades several refinement blocks sharing the same structure for progressive self-correcting. For each refinement block, the goal is to improve the output of the previous refinement block under the guidance of height-wise context. In this way, the IRM is capable of improving the rough network prediction step by step, and the refined features are increasingly focused on more discriminative scene regions as they are iteratively refined. In addition, considering different characteristics of input images, we devise an adaptive step estimation (ASE) mechanism, which enables our SIRNet to adapt the number of refinement steps to each input automatically. Concretely, the ASE is performed by comparing features at adjacent refinement steps, estimating whether the next step brings improvements, and finally making a halting decision at each refinement step. With the ASE, our SIRNet becomes a dynamic architecture that considers different characteristics of the inputs when performing the iterative refinement. Extensive experiments demonstrate that our SIRNet performs favorably against the state-of-the-art methods on the CVUSA and the CVACT datasets. Furthermore, quantitative and qualitative experimental results demonstrate our approach's wide applicability, impressive generalization ability, and robustness.
doi_str_mv 10.1109/TGRS.2022.3210195
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9913952</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9913952</ieee_id><sourcerecordid>2734386007</sourcerecordid><originalsourceid>FETCH-LOGICAL-c223t-3f97babacf2e0ac854e0cb3fbe9502ad0f154276f799ba62e5e19de05cbfaa1e3</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKs_QLwEPHhKzcdmd-OtFq2FQqGtFryE7HaiW9vNmk2V-uvdtcXTDMPzzjAPQpeM9hij6nY-nM56nHLeE5xRpuQR6jApU0LjKDpGnWYUE54qforO6npFKYskSzrodRRuajz5MDscHL4HvPCufLvDA-_qmrwU8I2H4MjY5WZd_JhQuBIvivCOZwEq0l-aKhRfgEcBvPnrpmCLEjZQhnN0Ys26hotD7aLnx4f54ImMJ8PRoD8mOeciEGFVkpnM5JYDNXkqI6B5JmwGSlJultQyGfEktolSmYk5SGBqCVTmmTWGgeii6_3eyrvPLdRBr9zWl81JzRMRiTSmNGkotqfy9jMPVle-2Bi_04zqVqFuFepWoT4obDJX-0wBAP-8UkwoycUvhHZtYg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2734386007</pqid></control><display><type>article</type><title>It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement</title><source>IEEE Electronic Library (IEL)</source><creator>Lu, Xiufan ; Luo, Siqi ; Zhu, Yingying</creator><creatorcontrib>Lu, Xiufan ; Luo, Siqi ; Zhu, Yingying</creatorcontrib><description>Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations in a single step for this task, in this article, we instead advocate endowing the network with the capability of progressive self-correcting. Toward this target, we propose a novel step-adaptive iterative refinement network (SIRNet), which decomposes the complex learning process into several refinement steps while adapting the refinement steps specifically for each input. Specifically, the SIRNet takes the output of the backbone as a rough network prediction and iteratively refines it via an iterative refinement module (IRM). The IRM cascades several refinement blocks sharing the same structure for progressive self-correcting. For each refinement block, the goal is to improve the output of the previous refinement block under the guidance of height-wise context. In this way, the IRM is capable of improving the rough network prediction step by step, and the refined features are increasingly focused on more discriminative scene regions as they are iteratively refined. In addition, considering different characteristics of input images, we devise an adaptive step estimation (ASE) mechanism, which enables our SIRNet to adapt the number of refinement steps to each input automatically. Concretely, the ASE is performed by comparing features at adjacent refinement steps, estimating whether the next step brings improvements, and finally making a halting decision at each refinement step. With the ASE, our SIRNet becomes a dynamic architecture that considers different characteristics of the inputs when performing the iterative refinement. Extensive experiments demonstrate that our SIRNet performs favorably against the state-of-the-art methods on the CVUSA and the CVACT datasets. Furthermore, quantitative and qualitative experimental results demonstrate our approach's wide applicability, impressive generalization ability, and robustness.</description><identifier>ISSN: 0196-2892</identifier><identifier>EISSN: 1558-0644</identifier><identifier>DOI: 10.1109/TGRS.2022.3210195</identifier><identifier>CODEN: IGRSD2</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Adaptive estimation ; convolutional neural network ; cross-view geo-localization ; Decision making ; Estimation ; image retrieval ; Iterative methods ; iterative refinement ; Localization ; Location awareness ; Satellite imagery ; Task analysis ; Transforms ; Transient analysis ; Visualization</subject><ispartof>IEEE transactions on geoscience and remote sensing, 2022, Vol.60, p.1-13</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c223t-3f97babacf2e0ac854e0cb3fbe9502ad0f154276f799ba62e5e19de05cbfaa1e3</citedby><cites>FETCH-LOGICAL-c223t-3f97babacf2e0ac854e0cb3fbe9502ad0f154276f799ba62e5e19de05cbfaa1e3</cites><orcidid>0000-0002-3475-6186</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9913952$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,4010,27900,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9913952$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lu, Xiufan</creatorcontrib><creatorcontrib>Luo, Siqi</creatorcontrib><creatorcontrib>Zhu, Yingying</creatorcontrib><title>It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement</title><title>IEEE transactions on geoscience and remote sensing</title><addtitle>TGRS</addtitle><description>Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations in a single step for this task, in this article, we instead advocate endowing the network with the capability of progressive self-correcting. Toward this target, we propose a novel step-adaptive iterative refinement network (SIRNet), which decomposes the complex learning process into several refinement steps while adapting the refinement steps specifically for each input. Specifically, the SIRNet takes the output of the backbone as a rough network prediction and iteratively refines it via an iterative refinement module (IRM). The IRM cascades several refinement blocks sharing the same structure for progressive self-correcting. For each refinement block, the goal is to improve the output of the previous refinement block under the guidance of height-wise context. In this way, the IRM is capable of improving the rough network prediction step by step, and the refined features are increasingly focused on more discriminative scene regions as they are iteratively refined. In addition, considering different characteristics of input images, we devise an adaptive step estimation (ASE) mechanism, which enables our SIRNet to adapt the number of refinement steps to each input automatically. Concretely, the ASE is performed by comparing features at adjacent refinement steps, estimating whether the next step brings improvements, and finally making a halting decision at each refinement step. With the ASE, our SIRNet becomes a dynamic architecture that considers different characteristics of the inputs when performing the iterative refinement. Extensive experiments demonstrate that our SIRNet performs favorably against the state-of-the-art methods on the CVUSA and the CVACT datasets. Furthermore, quantitative and qualitative experimental results demonstrate our approach's wide applicability, impressive generalization ability, and robustness.</description><subject>Adaptive estimation</subject><subject>convolutional neural network</subject><subject>cross-view geo-localization</subject><subject>Decision making</subject><subject>Estimation</subject><subject>image retrieval</subject><subject>Iterative methods</subject><subject>iterative refinement</subject><subject>Localization</subject><subject>Location awareness</subject><subject>Satellite imagery</subject><subject>Task analysis</subject><subject>Transforms</subject><subject>Transient analysis</subject><subject>Visualization</subject><issn>0196-2892</issn><issn>1558-0644</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKs_QLwEPHhKzcdmd-OtFq2FQqGtFryE7HaiW9vNmk2V-uvdtcXTDMPzzjAPQpeM9hij6nY-nM56nHLeE5xRpuQR6jApU0LjKDpGnWYUE54qforO6npFKYskSzrodRRuajz5MDscHL4HvPCufLvDA-_qmrwU8I2H4MjY5WZd_JhQuBIvivCOZwEq0l-aKhRfgEcBvPnrpmCLEjZQhnN0Ys26hotD7aLnx4f54ImMJ8PRoD8mOeciEGFVkpnM5JYDNXkqI6B5JmwGSlJultQyGfEktolSmYk5SGBqCVTmmTWGgeii6_3eyrvPLdRBr9zWl81JzRMRiTSmNGkotqfy9jMPVle-2Bi_04zqVqFuFepWoT4obDJX-0wBAP-8UkwoycUvhHZtYg</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Lu, Xiufan</creator><creator>Luo, Siqi</creator><creator>Zhu, Yingying</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>FR3</scope><scope>H8D</scope><scope>H96</scope><scope>KR7</scope><scope>L.G</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0002-3475-6186</orcidid></search><sort><creationdate>2022</creationdate><title>It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement</title><author>Lu, Xiufan ; Luo, Siqi ; Zhu, Yingying</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c223t-3f97babacf2e0ac854e0cb3fbe9502ad0f154276f799ba62e5e19de05cbfaa1e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Adaptive estimation</topic><topic>convolutional neural network</topic><topic>cross-view geo-localization</topic><topic>Decision making</topic><topic>Estimation</topic><topic>image retrieval</topic><topic>Iterative methods</topic><topic>iterative refinement</topic><topic>Localization</topic><topic>Location awareness</topic><topic>Satellite imagery</topic><topic>Task analysis</topic><topic>Transforms</topic><topic>Transient analysis</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lu, Xiufan</creatorcontrib><creatorcontrib>Luo, Siqi</creatorcontrib><creatorcontrib>Zhu, Yingying</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) 2: Ocean Technology, Policy &amp; Non-Living Resources</collection><collection>Civil Engineering Abstracts</collection><collection>Aquatic Science &amp; Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on geoscience and remote sensing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lu, Xiufan</au><au>Luo, Siqi</au><au>Zhu, Yingying</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement</atitle><jtitle>IEEE transactions on geoscience and remote sensing</jtitle><stitle>TGRS</stitle><date>2022</date><risdate>2022</risdate><volume>60</volume><spage>1</spage><epage>13</epage><pages>1-13</pages><issn>0196-2892</issn><eissn>1558-0644</eissn><coden>IGRSD2</coden><abstract>Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations in a single step for this task, in this article, we instead advocate endowing the network with the capability of progressive self-correcting. Toward this target, we propose a novel step-adaptive iterative refinement network (SIRNet), which decomposes the complex learning process into several refinement steps while adapting the refinement steps specifically for each input. Specifically, the SIRNet takes the output of the backbone as a rough network prediction and iteratively refines it via an iterative refinement module (IRM). The IRM cascades several refinement blocks sharing the same structure for progressive self-correcting. For each refinement block, the goal is to improve the output of the previous refinement block under the guidance of height-wise context. In this way, the IRM is capable of improving the rough network prediction step by step, and the refined features are increasingly focused on more discriminative scene regions as they are iteratively refined. In addition, considering different characteristics of input images, we devise an adaptive step estimation (ASE) mechanism, which enables our SIRNet to adapt the number of refinement steps to each input automatically. Concretely, the ASE is performed by comparing features at adjacent refinement steps, estimating whether the next step brings improvements, and finally making a halting decision at each refinement step. With the ASE, our SIRNet becomes a dynamic architecture that considers different characteristics of the inputs when performing the iterative refinement. Extensive experiments demonstrate that our SIRNet performs favorably against the state-of-the-art methods on the CVUSA and the CVACT datasets. Furthermore, quantitative and qualitative experimental results demonstrate our approach's wide applicability, impressive generalization ability, and robustness.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TGRS.2022.3210195</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-3475-6186</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0196-2892
ispartof IEEE transactions on geoscience and remote sensing, 2022, Vol.60, p.1-13
issn 0196-2892
1558-0644
language eng
recordid cdi_ieee_primary_9913952
source IEEE Electronic Library (IEL)
subjects Adaptive estimation
convolutional neural network
cross-view geo-localization
Decision making
Estimation
image retrieval
Iterative methods
iterative refinement
Localization
Location awareness
Satellite imagery
Task analysis
Transforms
Transient analysis
Visualization
title It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T05%3A36%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=It's%20Okay%20to%20Be%20Wrong:%20Cross-View%20Geo-Localization%20With%20Step-Adaptive%20Iterative%20Refinement&rft.jtitle=IEEE%20transactions%20on%20geoscience%20and%20remote%20sensing&rft.au=Lu,%20Xiufan&rft.date=2022&rft.volume=60&rft.spage=1&rft.epage=13&rft.pages=1-13&rft.issn=0196-2892&rft.eissn=1558-0644&rft.coden=IGRSD2&rft_id=info:doi/10.1109/TGRS.2022.3210195&rft_dat=%3Cproquest_RIE%3E2734386007%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2734386007&rft_id=info:pmid/&rft_ieee_id=9913952&rfr_iscdi=true