It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement

Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on geoscience and remote sensing 2022, Vol.60, p.1-13
Hauptverfasser:	Lu, Xiufan, Luo, Siqi, Zhu, Yingying
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive estimation convolutional neural network cross-view geo-localization Decision making Estimation image retrieval Iterative methods iterative refinement Localization Location awareness Satellite imagery Task analysis Transforms Transient analysis Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	13
container_issue
container_start_page	1
container_title	IEEE transactions on geoscience and remote sensing
container_volume	60
creator	Lu, Xiufan Luo, Siqi Zhu, Yingying
description	Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations in a single step for this task, in this article, we instead advocate endowing the network with the capability of progressive self-correcting. Toward this target, we propose a novel step-adaptive iterative refinement network (SIRNet), which decomposes the complex learning process into several refinement steps while adapting the refinement steps specifically for each input. Specifically, the SIRNet takes the output of the backbone as a rough network prediction and iteratively refines it via an iterative refinement module (IRM). The IRM cascades several refinement blocks sharing the same structure for progressive self-correcting. For each refinement block, the goal is to improve the output of the previous refinement block under the guidance of height-wise context. In this way, the IRM is capable of improving the rough network prediction step by step, and the refined features are increasingly focused on more discriminative scene regions as they are iteratively refined. In addition, considering different characteristics of input images, we devise an adaptive step estimation (ASE) mechanism, which enables our SIRNet to adapt the number of refinement steps to each input automatically. Concretely, the ASE is performed by comparing features at adjacent refinement steps, estimating whether the next step brings improvements, and finally making a halting decision at each refinement step. With the ASE, our SIRNet becomes a dynamic architecture that considers different characteristics of the inputs when performing the iterative refinement. Extensive experiments demonstrate that our SIRNet performs favorably against the state-of-the-art methods on the CVUSA and the CVACT datasets. Furthermore, quantitative and qualitative experimental results demonstrate our approach's wide applicability, impressive generalization ability, and robustness.
doi_str_mv	10.1109/TGRS.2022.3210195
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9913952</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9913952</ieee_id><sourcerecordid>2734386007</sourcerecordid><originalsourceid>FETCH-LOGICAL-c223t-3f97babacf2e0ac854e0cb3fbe9502ad0f154276f799ba62e5e19de05cbfaa1e3</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKs_QLwEPHhKzcdmd-OtFq2FQqGtFryE7HaiW9vNmk2V-uvdtcXTDMPzzjAPQpeM9hij6nY-nM56nHLeE5xRpuQR6jApU0LjKDpGnWYUE54qforO6npFKYskSzrodRRuajz5MDscHL4HvPCufLvDA-_qmrwU8I2H4MjY5WZd_JhQuBIvivCOZwEq0l-aKhRfgEcBvPnrpmCLEjZQhnN0Ys26hotD7aLnx4f54ImMJ8PRoD8mOeciEGFVkpnM5JYDNXkqI6B5JmwGSlJultQyGfEktolSmYk5SGBqCVTmmTWGgeii6_3eyrvPLdRBr9zWl81JzRMRiTSmNGkotqfy9jMPVle-2Bi_04zqVqFuFepWoT4obDJX-0wBAP-8UkwoycUvhHZtYg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2734386007</pqid></control><display><type>article</type><title>It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement</title><source>IEEE Electronic Library (IEL)</source><creator>Lu, Xiufan ; Luo, Siqi ; Zhu, Yingying</creator><creatorcontrib>Lu, Xiufan ; Luo, Siqi ; Zhu, Yingying</creatorcontrib><description>Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations in a single step for this task, in this article, we instead advocate endowing the network with the capability of progressive self-correcting. Toward this target, we propose a novel step-adaptive iterative refinement network (SIRNet), which decomposes the complex learning process into several refinement steps while adapting the refinement steps specifically for each input. Specifically, the SIRNet takes the output of the backbone as a rough network prediction and iteratively refines it via an iterative refinement module (IRM). The IRM cascades several refinement blocks sharing the same structure for progressive self-correcting. For each refinement block, the goal is to improve the output of the previous refinement block under the guidance of height-wise context. In this way, the IRM is capable of improving the rough network prediction step by step, and the refined features are increasingly focused on more discriminative scene regions as they are iteratively refined. In addition, considering different characteristics of input images, we devise an adaptive step estimation (ASE) mechanism, which enables our SIRNet to adapt the number of refinement steps to each input automatically. Concretely, the ASE is performed by comparing features at adjacent refinement steps, estimating whether the next step brings improvements, and finally making a halting decision at each refinement step. With the ASE, our SIRNet becomes a dynamic architecture that considers different characteristics of the inputs when performing the iterative refinement. Extensive experiments demonstrate that our SIRNet performs favorably against the state-of-the-art methods on the CVUSA and the CVACT datasets. Furthermore, quantitative and qualitative experimental results demonstrate our approach's wide applicability, impressive generalization ability, and robustness.</description><identifier>ISSN: 0196-2892</identifier><identifier>EISSN: 1558-0644</identifier><identifier>DOI: 10.1109/TGRS.2022.3210195</identifier><identifier>CODEN: IGRSD2</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Adaptive estimation ; convolutional neural network ; cross-view geo-localization ; Decision making ; Estimation ; image retrieval ; Iterative methods ; iterative refinement ; Localization ; Location awareness ; Satellite imagery ; Task analysis ; Transforms ; Transient analysis ; Visualization</subject><ispartof>IEEE transactions on geoscience and remote sensing, 2022, Vol.60, p.1-13</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c223t-3f97babacf2e0ac854e0cb3fbe9502ad0f154276f799ba62e5e19de05cbfaa1e3</citedby><cites>FETCH-LOGICAL-c223t-3f97babacf2e0ac854e0cb3fbe9502ad0f154276f799ba62e5e19de05cbfaa1e3</cites><orcidid>0000-0002-3475-6186</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9913952$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,4010,27900,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9913952$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lu, Xiufan</creatorcontrib><creatorcontrib>Luo, Siqi</creatorcontrib><creatorcontrib>Zhu, Yingying</creatorcontrib><title>It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement</title><title>IEEE transactions on geoscience and remote sensing</title><addtitle>TGRS</addtitle><description>Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations in a single step for this task, in this article, we instead advocate endowing the network with the capability of progressive self-correcting. Toward this target, we propose a novel step-adaptive iterative refinement network (SIRNet), which decomposes the complex learning process into several refinement steps while adapting the refinement steps specifically for each input. Specifically, the SIRNet takes the output of the backbone as a rough network prediction and iteratively refines it via an iterative refinement module (IRM). The IRM cascades several refinement blocks sharing the same structure for progressive self-correcting. For each refinement block, the goal is to improve the output of the previous refinement block under the guidance of height-wise context. In this way, the IRM is capable of improving the rough network prediction step by step, and the refined features are increasingly focused on more discriminative scene regions as they are iteratively refined. In addition, considering different characteristics of input images, we devise an adaptive step estimation (ASE) mechanism, which enables our SIRNet to adapt the number of refinement steps to each input automatically. Concretely, the ASE is performed by comparing features at adjacent refinement steps, estimating whether the next step brings improvements, and finally making a halting decision at each refinement step. With the ASE, our SIRNet becomes a dynamic architecture that considers different characteristics of the inputs when performing the iterative refinement. Extensive experiments demonstrate that our SIRNet performs favorably against the state-of-the-art methods on the CVUSA and the CVACT datasets. Furthermore, quantitative and qualitative experimental results demonstrate our approach's wide applicability, impressive generalization ability, and robustness.</description><subject>Adaptive estimation</subject><subject>convolutional neural network</subject><subject>cross-view geo-localization</subject><subject>Decision making</subject><subject>Estimation</subject><subject>image retrieval</subject><subject>Iterative methods</subject><subject>iterative refinement</subject><subject>Localization</subject><subject>Location awareness</subject><subject>Satellite imagery</subject><subject>Task analysis</subject><subject>Transforms</subject><subject>Transient analysis</subject><subject>Visualization</subject><issn>0196-2892</issn><issn>1558-0644</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKs_QLwEPHhKzcdmd-OtFq2FQqGtFryE7HaiW9vNmk2V-uvdtcXTDMPzzjAPQpeM9hij6nY-nM56nHLeE5xRpuQR6jApU0LjKDpGnWYUE54qforO6npFKYskSzrodRRuajz5MDscHL4HvPCufLvDA-_qmrwU8I2H4MjY5WZd_JhQuBIvivCOZwEq0l-aKhRfgEcBvPnrpmCLEjZQhnN0Ys26hotD7aLnx4f54ImMJ8PRoD8mOeciEGFVkpnM5JYDNXkqI6B5JmwGSlJultQyGfEktolSmYk5SGBqCVTmmTWGgeii6_3eyrvPLdRBr9zWl81JzRMRiTSmNGkotqfy9jMPVle-2Bi_04zqVqFuFepWoT4obDJX-0wBAP-8UkwoycUvhHZtYg</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Lu, Xiufan</creator><creator>Luo, Siqi</creator><creator>Zhu, Yingying</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>FR3</scope><scope>H8D</scope><scope>H96</scope><scope>KR7</scope><scope>L.G</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0002-3475-6186</orcidid></search><sort><creationdate>2022</creationdate><title>It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement</title><author>Lu, Xiufan ; Luo, Siqi ; Zhu, Yingying</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c223t-3f97babacf2e0ac854e0cb3fbe9502ad0f154276f799ba62e5e19de05cbfaa1e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Adaptive estimation</topic><topic>convolutional neural network</topic><topic>cross-view geo-localization</topic><topic>Decision making</topic><topic>Estimation</topic><topic>image retrieval</topic><topic>Iterative methods</topic><topic>iterative refinement</topic><topic>Localization</topic><topic>Location awareness</topic><topic>Satellite imagery</topic><topic>Task analysis</topic><topic>Transforms</topic><topic>Transient analysis</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lu, Xiufan</creatorcontrib><creatorcontrib>Luo, Siqi</creatorcontrib><creatorcontrib>Zhu, Yingying</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) 2: Ocean Technology, Policy & Non-Living Resources</collection><collection>Civil Engineering Abstracts</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on geoscience and remote sensing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lu, Xiufan</au><au>Luo, Siqi</au><au>Zhu, Yingying</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement</atitle><jtitle>IEEE transactions on geoscience and remote sensing</jtitle><stitle>TGRS</stitle><date>2022</date><risdate>2022</risdate><volume>60</volume><spage>1</spage><epage>13</epage><pages>1-13</pages><issn>0196-2892</issn><eissn>1558-0644</eissn><coden>IGRSD2</coden><abstract>Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations in a single step for this task, in this article, we instead advocate endowing the network with the capability of progressive self-correcting. Toward this target, we propose a novel step-adaptive iterative refinement network (SIRNet), which decomposes the complex learning process into several refinement steps while adapting the refinement steps specifically for each input. Specifically, the SIRNet takes the output of the backbone as a rough network prediction and iteratively refines it via an iterative refinement module (IRM). The IRM cascades several refinement blocks sharing the same structure for progressive self-correcting. For each refinement block, the goal is to improve the output of the previous refinement block under the guidance of height-wise context. In this way, the IRM is capable of improving the rough network prediction step by step, and the refined features are increasingly focused on more discriminative scene regions as they are iteratively refined. In addition, considering different characteristics of input images, we devise an adaptive step estimation (ASE) mechanism, which enables our SIRNet to adapt the number of refinement steps to each input automatically. Concretely, the ASE is performed by comparing features at adjacent refinement steps, estimating whether the next step brings improvements, and finally making a halting decision at each refinement step. With the ASE, our SIRNet becomes a dynamic architecture that considers different characteristics of the inputs when performing the iterative refinement. Extensive experiments demonstrate that our SIRNet performs favorably against the state-of-the-art methods on the CVUSA and the CVACT datasets. Furthermore, quantitative and qualitative experimental results demonstrate our approach's wide applicability, impressive generalization ability, and robustness.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TGRS.2022.3210195</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-3475-6186</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0196-2892
ispartof	IEEE transactions on geoscience and remote sensing, 2022, Vol.60, p.1-13
issn	0196-2892 1558-0644
language	eng
recordid	cdi_ieee_primary_9913952
source	IEEE Electronic Library (IEL)
subjects	Adaptive estimation convolutional neural network cross-view geo-localization Decision making Estimation image retrieval Iterative methods iterative refinement Localization Location awareness Satellite imagery Task analysis Transforms Transient analysis Visualization
title	It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T05%3A36%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=It's%20Okay%20to%20Be%20Wrong:%20Cross-View%20Geo-Localization%20With%20Step-Adaptive%20Iterative%20Refinement&rft.jtitle=IEEE%20transactions%20on%20geoscience%20and%20remote%20sensing&rft.au=Lu,%20Xiufan&rft.date=2022&rft.volume=60&rft.spage=1&rft.epage=13&rft.pages=1-13&rft.issn=0196-2892&rft.eissn=1558-0644&rft.coden=IGRSD2&rft_id=info:doi/10.1109/TGRS.2022.3210195&rft_dat=%3Cproquest_RIE%3E2734386007%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2734386007&rft_id=info:pmid/&rft_ieee_id=9913952&rfr_iscdi=true