It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement
Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on geoscience and remote sensing 2022, Vol.60, p.1-13 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 13 |
---|---|
container_issue | |
container_start_page | 1 |
container_title | IEEE transactions on geoscience and remote sensing |
container_volume | 60 |
creator | Lu, Xiufan Luo, Siqi Zhu, Yingying |
description | Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations in a single step for this task, in this article, we instead advocate endowing the network with the capability of progressive self-correcting. Toward this target, we propose a novel step-adaptive iterative refinement network (SIRNet), which decomposes the complex learning process into several refinement steps while adapting the refinement steps specifically for each input. Specifically, the SIRNet takes the output of the backbone as a rough network prediction and iteratively refines it via an iterative refinement module (IRM). The IRM cascades several refinement blocks sharing the same structure for progressive self-correcting. For each refinement block, the goal is to improve the output of the previous refinement block under the guidance of height-wise context. In this way, the IRM is capable of improving the rough network prediction step by step, and the refined features are increasingly focused on more discriminative scene regions as they are iteratively refined. In addition, considering different characteristics of input images, we devise an adaptive step estimation (ASE) mechanism, which enables our SIRNet to adapt the number of refinement steps to each input automatically. Concretely, the ASE is performed by comparing features at adjacent refinement steps, estimating whether the next step brings improvements, and finally making a halting decision at each refinement step. With the ASE, our SIRNet becomes a dynamic architecture that considers different characteristics of the inputs when performing the iterative refinement. Extensive experiments demonstrate that our SIRNet performs favorably against the state-of-the-art methods on the CVUSA and the CVACT datasets. Furthermore, quantitative and qualitative experimental results demonstrate our approach's wide applicability, impressive generalization ability, and robustness. |
doi_str_mv | 10.1109/TGRS.2022.3210195 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_9913952</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9913952</ieee_id><sourcerecordid>2734386007</sourcerecordid><originalsourceid>FETCH-LOGICAL-c223t-3f97babacf2e0ac854e0cb3fbe9502ad0f154276f799ba62e5e19de05cbfaa1e3</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKs_QLwEPHhKzcdmd-OtFq2FQqGtFryE7HaiW9vNmk2V-uvdtcXTDMPzzjAPQpeM9hij6nY-nM56nHLeE5xRpuQR6jApU0LjKDpGnWYUE54qforO6npFKYskSzrodRRuajz5MDscHL4HvPCufLvDA-_qmrwU8I2H4MjY5WZd_JhQuBIvivCOZwEq0l-aKhRfgEcBvPnrpmCLEjZQhnN0Ys26hotD7aLnx4f54ImMJ8PRoD8mOeciEGFVkpnM5JYDNXkqI6B5JmwGSlJultQyGfEktolSmYk5SGBqCVTmmTWGgeii6_3eyrvPLdRBr9zWl81JzRMRiTSmNGkotqfy9jMPVle-2Bi_04zqVqFuFepWoT4obDJX-0wBAP-8UkwoycUvhHZtYg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2734386007</pqid></control><display><type>article</type><title>It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement</title><source>IEEE Electronic Library (IEL)</source><creator>Lu, Xiufan ; Luo, Siqi ; Zhu, Yingying</creator><creatorcontrib>Lu, Xiufan ; Luo, Siqi ; Zhu, Yingying</creatorcontrib><description>Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations in a single step for this task, in this article, we instead advocate endowing the network with the capability of progressive self-correcting. Toward this target, we propose a novel step-adaptive iterative refinement network (SIRNet), which decomposes the complex learning process into several refinement steps while adapting the refinement steps specifically for each input. Specifically, the SIRNet takes the output of the backbone as a rough network prediction and iteratively refines it via an iterative refinement module (IRM). The IRM cascades several refinement blocks sharing the same structure for progressive self-correcting. For each refinement block, the goal is to improve the output of the previous refinement block under the guidance of height-wise context. In this way, the IRM is capable of improving the rough network prediction step by step, and the refined features are increasingly focused on more discriminative scene regions as they are iteratively refined. In addition, considering different characteristics of input images, we devise an adaptive step estimation (ASE) mechanism, which enables our SIRNet to adapt the number of refinement steps to each input automatically. Concretely, the ASE is performed by comparing features at adjacent refinement steps, estimating whether the next step brings improvements, and finally making a halting decision at each refinement step. With the ASE, our SIRNet becomes a dynamic architecture that considers different characteristics of the inputs when performing the iterative refinement. Extensive experiments demonstrate that our SIRNet performs favorably against the state-of-the-art methods on the CVUSA and the CVACT datasets. Furthermore, quantitative and qualitative experimental results demonstrate our approach's wide applicability, impressive generalization ability, and robustness.</description><identifier>ISSN: 0196-2892</identifier><identifier>EISSN: 1558-0644</identifier><identifier>DOI: 10.1109/TGRS.2022.3210195</identifier><identifier>CODEN: IGRSD2</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Adaptive estimation ; convolutional neural network ; cross-view geo-localization ; Decision making ; Estimation ; image retrieval ; Iterative methods ; iterative refinement ; Localization ; Location awareness ; Satellite imagery ; Task analysis ; Transforms ; Transient analysis ; Visualization</subject><ispartof>IEEE transactions on geoscience and remote sensing, 2022, Vol.60, p.1-13</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c223t-3f97babacf2e0ac854e0cb3fbe9502ad0f154276f799ba62e5e19de05cbfaa1e3</citedby><cites>FETCH-LOGICAL-c223t-3f97babacf2e0ac854e0cb3fbe9502ad0f154276f799ba62e5e19de05cbfaa1e3</cites><orcidid>0000-0002-3475-6186</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9913952$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,4010,27900,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9913952$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Lu, Xiufan</creatorcontrib><creatorcontrib>Luo, Siqi</creatorcontrib><creatorcontrib>Zhu, Yingying</creatorcontrib><title>It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement</title><title>IEEE transactions on geoscience and remote sensing</title><addtitle>TGRS</addtitle><description>Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations in a single step for this task, in this article, we instead advocate endowing the network with the capability of progressive self-correcting. Toward this target, we propose a novel step-adaptive iterative refinement network (SIRNet), which decomposes the complex learning process into several refinement steps while adapting the refinement steps specifically for each input. Specifically, the SIRNet takes the output of the backbone as a rough network prediction and iteratively refines it via an iterative refinement module (IRM). The IRM cascades several refinement blocks sharing the same structure for progressive self-correcting. For each refinement block, the goal is to improve the output of the previous refinement block under the guidance of height-wise context. In this way, the IRM is capable of improving the rough network prediction step by step, and the refined features are increasingly focused on more discriminative scene regions as they are iteratively refined. In addition, considering different characteristics of input images, we devise an adaptive step estimation (ASE) mechanism, which enables our SIRNet to adapt the number of refinement steps to each input automatically. Concretely, the ASE is performed by comparing features at adjacent refinement steps, estimating whether the next step brings improvements, and finally making a halting decision at each refinement step. With the ASE, our SIRNet becomes a dynamic architecture that considers different characteristics of the inputs when performing the iterative refinement. Extensive experiments demonstrate that our SIRNet performs favorably against the state-of-the-art methods on the CVUSA and the CVACT datasets. Furthermore, quantitative and qualitative experimental results demonstrate our approach's wide applicability, impressive generalization ability, and robustness.</description><subject>Adaptive estimation</subject><subject>convolutional neural network</subject><subject>cross-view geo-localization</subject><subject>Decision making</subject><subject>Estimation</subject><subject>image retrieval</subject><subject>Iterative methods</subject><subject>iterative refinement</subject><subject>Localization</subject><subject>Location awareness</subject><subject>Satellite imagery</subject><subject>Task analysis</subject><subject>Transforms</subject><subject>Transient analysis</subject><subject>Visualization</subject><issn>0196-2892</issn><issn>1558-0644</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKs_QLwEPHhKzcdmd-OtFq2FQqGtFryE7HaiW9vNmk2V-uvdtcXTDMPzzjAPQpeM9hij6nY-nM56nHLeE5xRpuQR6jApU0LjKDpGnWYUE54qforO6npFKYskSzrodRRuajz5MDscHL4HvPCufLvDA-_qmrwU8I2H4MjY5WZd_JhQuBIvivCOZwEq0l-aKhRfgEcBvPnrpmCLEjZQhnN0Ys26hotD7aLnx4f54ImMJ8PRoD8mOeciEGFVkpnM5JYDNXkqI6B5JmwGSlJultQyGfEktolSmYk5SGBqCVTmmTWGgeii6_3eyrvPLdRBr9zWl81JzRMRiTSmNGkotqfy9jMPVle-2Bi_04zqVqFuFepWoT4obDJX-0wBAP-8UkwoycUvhHZtYg</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Lu, Xiufan</creator><creator>Luo, Siqi</creator><creator>Zhu, Yingying</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7UA</scope><scope>8FD</scope><scope>C1K</scope><scope>F1W</scope><scope>FR3</scope><scope>H8D</scope><scope>H96</scope><scope>KR7</scope><scope>L.G</scope><scope>L7M</scope><orcidid>https://orcid.org/0000-0002-3475-6186</orcidid></search><sort><creationdate>2022</creationdate><title>It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement</title><author>Lu, Xiufan ; Luo, Siqi ; Zhu, Yingying</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c223t-3f97babacf2e0ac854e0cb3fbe9502ad0f154276f799ba62e5e19de05cbfaa1e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Adaptive estimation</topic><topic>convolutional neural network</topic><topic>cross-view geo-localization</topic><topic>Decision making</topic><topic>Estimation</topic><topic>image retrieval</topic><topic>Iterative methods</topic><topic>iterative refinement</topic><topic>Localization</topic><topic>Location awareness</topic><topic>Satellite imagery</topic><topic>Task analysis</topic><topic>Transforms</topic><topic>Transient analysis</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Lu, Xiufan</creatorcontrib><creatorcontrib>Luo, Siqi</creatorcontrib><creatorcontrib>Zhu, Yingying</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Water Resources Abstracts</collection><collection>Technology Research Database</collection><collection>Environmental Sciences and Pollution Management</collection><collection>ASFA: Aquatic Sciences and Fisheries Abstracts</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) 2: Ocean Technology, Policy & Non-Living Resources</collection><collection>Civil Engineering Abstracts</collection><collection>Aquatic Science & Fisheries Abstracts (ASFA) Professional</collection><collection>Advanced Technologies Database with Aerospace</collection><jtitle>IEEE transactions on geoscience and remote sensing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lu, Xiufan</au><au>Luo, Siqi</au><au>Zhu, Yingying</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement</atitle><jtitle>IEEE transactions on geoscience and remote sensing</jtitle><stitle>TGRS</stitle><date>2022</date><risdate>2022</risdate><volume>60</volume><spage>1</spage><epage>13</epage><pages>1-13</pages><issn>0196-2892</issn><eissn>1558-0644</eissn><coden>IGRSD2</coden><abstract>Cross-view image geo-localization is a challenging task of estimating the geospatial location of a street-view image by matching it with a database of geotagged aerial/satellite images, and vice versa. Compared to existing CNN-based approaches that attempt to generate discriminative representations in a single step for this task, in this article, we instead advocate endowing the network with the capability of progressive self-correcting. Toward this target, we propose a novel step-adaptive iterative refinement network (SIRNet), which decomposes the complex learning process into several refinement steps while adapting the refinement steps specifically for each input. Specifically, the SIRNet takes the output of the backbone as a rough network prediction and iteratively refines it via an iterative refinement module (IRM). The IRM cascades several refinement blocks sharing the same structure for progressive self-correcting. For each refinement block, the goal is to improve the output of the previous refinement block under the guidance of height-wise context. In this way, the IRM is capable of improving the rough network prediction step by step, and the refined features are increasingly focused on more discriminative scene regions as they are iteratively refined. In addition, considering different characteristics of input images, we devise an adaptive step estimation (ASE) mechanism, which enables our SIRNet to adapt the number of refinement steps to each input automatically. Concretely, the ASE is performed by comparing features at adjacent refinement steps, estimating whether the next step brings improvements, and finally making a halting decision at each refinement step. With the ASE, our SIRNet becomes a dynamic architecture that considers different characteristics of the inputs when performing the iterative refinement. Extensive experiments demonstrate that our SIRNet performs favorably against the state-of-the-art methods on the CVUSA and the CVACT datasets. Furthermore, quantitative and qualitative experimental results demonstrate our approach's wide applicability, impressive generalization ability, and robustness.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TGRS.2022.3210195</doi><tpages>13</tpages><orcidid>https://orcid.org/0000-0002-3475-6186</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0196-2892 |
ispartof | IEEE transactions on geoscience and remote sensing, 2022, Vol.60, p.1-13 |
issn | 0196-2892 1558-0644 |
language | eng |
recordid | cdi_ieee_primary_9913952 |
source | IEEE Electronic Library (IEL) |
subjects | Adaptive estimation convolutional neural network cross-view geo-localization Decision making Estimation image retrieval Iterative methods iterative refinement Localization Location awareness Satellite imagery Task analysis Transforms Transient analysis Visualization |
title | It's Okay to Be Wrong: Cross-View Geo-Localization With Step-Adaptive Iterative Refinement |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T05%3A36%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=It's%20Okay%20to%20Be%20Wrong:%20Cross-View%20Geo-Localization%20With%20Step-Adaptive%20Iterative%20Refinement&rft.jtitle=IEEE%20transactions%20on%20geoscience%20and%20remote%20sensing&rft.au=Lu,%20Xiufan&rft.date=2022&rft.volume=60&rft.spage=1&rft.epage=13&rft.pages=1-13&rft.issn=0196-2892&rft.eissn=1558-0644&rft.coden=IGRSD2&rft_id=info:doi/10.1109/TGRS.2022.3210195&rft_dat=%3Cproquest_RIE%3E2734386007%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2734386007&rft_id=info:pmid/&rft_ieee_id=9913952&rfr_iscdi=true |