Learning Semantic-Aware Local Features for Long Term Visual Localization

Extracting robust and discriminative local features from images plays a vital role for long term visual localization, whose challenges are mainly caused by the severe appearance differences between matching images due to the day-night illuminations, seasonal changes, and human activities. Existing s...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on image processing 2022, Vol.31, p.4842-4855
Hauptverfasser:	Fan, Bin, Zhou, Junjie, Feng, Wensen, Pu, Huayan, Yang, Yuzhu, Kong, Qingqun, Wu, Fuchao, Liu, Hongmin
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Benchmark testing Benchmarks Feature extraction Feature maps Image annotation image matching Image segmentation knowledge distillation Learning Local feature Localization Location awareness Long term Machine learning Matching Night Seasonal variations Semantics Three-dimensional displays Visual discrimination visual localization Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	4855
container_issue
container_start_page	4842
container_title	IEEE transactions on image processing
container_volume	31
creator	Fan, Bin Zhou, Junjie Feng, Wensen Pu, Huayan Yang, Yuzhu Kong, Qingqun Wu, Fuchao Liu, Hongmin
description	Extracting robust and discriminative local features from images plays a vital role for long term visual localization, whose challenges are mainly caused by the severe appearance differences between matching images due to the day-night illuminations, seasonal changes, and human activities. Existing solutions resort to jointly learning both keypoints and their descriptors in an end-to-end manner, leveraged on large number of annotations of point correspondence which are harvested from the structure from motion and depth estimation algorithms. While these methods show improved performance over non-deep methods or those two-stage deep methods, i.e. , detection and then description, they are still struggled to conquer the problems encountered in long term visual localization. Since the intrinsic semantics are invariant to the local appearance changes, this paper proposes to learn semantic-aware local features in order to improve robustness of local feature matching for long term localization. Based on a state of the art CNN architecture for local feature learning, i.e. , ASLFeat, this paper leverages on the semantic information from an off-the-shelf semantic segmentation network to learn semantic-aware feature maps. The learned correspondence-aware feature descriptors and semantic features are then merged to form the final feature descriptors, for which the improved feature matching ability has been observed in experiments. In addition, the learned semantics embedded in the features can be further used to filter out noisy keypoints, leading to additional accuracy improvement and faster matching speed. Experiments on two popular long term visual localization benchmarks (Aachen Day and Night v1.1, Robotcar Seasons) and one challenging indoor benchmark (InLoc) demonstrate encouraging improvements of the localization accuracy over its counterpart and other competitive methods.
doi_str_mv	10.1109/TIP.2022.3187565
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2691864960</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9829199</ieee_id><sourcerecordid>2691864960</sourcerecordid><originalsourceid>FETCH-LOGICAL-c324t-d13fad363bb4b1d952347e84d1d37a455db6db1f464f8b7a0a0c02b3f734267c3</originalsourceid><addsrcrecordid>eNpdkE1LAzEQhoMotlbvgpcFL162ziTZZHMsxdpCQcHqdcnuZiVlP2qyi-ivN7XFg6cZZp53GB5CrhGmiKDuN6vnKQVKpwxTmYjkhIxRcYwBOD0NPSQylsjViFx4vwVAnqA4JyOWpAw4yDFZro12rW3foxfT6La3RTz71M5E667QdbQwuh-c8VHVuTAK2Ma4JnqzfgjbX8Z-69527SU5q3TtzdWxTsjr4mEzX8brp8fVfLaOC0Z5H5fIKl0ywfKc51iqhDIuTcpLLJnUPEnKXJQ5VlzwKs2lBg0F0JxVknEqZMEm5O5wd-e6j8H4PmusL0xd69Z0g8-oSBUIRCUDevsP3XaDa8N3gVKYCq4EBAoOVOE6752psp2zjXZfGUK2t5wFy9necna0HCI3h4g1xvzhKqUKlWI_gdJ1Dg</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2691864960</pqid></control><display><type>article</type><title>Learning Semantic-Aware Local Features for Long Term Visual Localization</title><source>IEEE Electronic Library (IEL)</source><creator>Fan, Bin ; Zhou, Junjie ; Feng, Wensen ; Pu, Huayan ; Yang, Yuzhu ; Kong, Qingqun ; Wu, Fuchao ; Liu, Hongmin</creator><creatorcontrib>Fan, Bin ; Zhou, Junjie ; Feng, Wensen ; Pu, Huayan ; Yang, Yuzhu ; Kong, Qingqun ; Wu, Fuchao ; Liu, Hongmin</creatorcontrib><description>Extracting robust and discriminative local features from images plays a vital role for long term visual localization, whose challenges are mainly caused by the severe appearance differences between matching images due to the day-night illuminations, seasonal changes, and human activities. Existing solutions resort to jointly learning both keypoints and their descriptors in an end-to-end manner, leveraged on large number of annotations of point correspondence which are harvested from the structure from motion and depth estimation algorithms. While these methods show improved performance over non-deep methods or those two-stage deep methods, i.e. , detection and then description, they are still struggled to conquer the problems encountered in long term visual localization. Since the intrinsic semantics are invariant to the local appearance changes, this paper proposes to learn semantic-aware local features in order to improve robustness of local feature matching for long term localization. Based on a state of the art CNN architecture for local feature learning, i.e. , ASLFeat, this paper leverages on the semantic information from an off-the-shelf semantic segmentation network to learn semantic-aware feature maps. The learned correspondence-aware feature descriptors and semantic features are then merged to form the final feature descriptors, for which the improved feature matching ability has been observed in experiments. In addition, the learned semantics embedded in the features can be further used to filter out noisy keypoints, leading to additional accuracy improvement and faster matching speed. Experiments on two popular long term visual localization benchmarks (Aachen Day and Night v1.1, Robotcar Seasons) and one challenging indoor benchmark (InLoc) demonstrate encouraging improvements of the localization accuracy over its counterpart and other competitive methods.</description><identifier>ISSN: 1057-7149</identifier><identifier>EISSN: 1941-0042</identifier><identifier>DOI: 10.1109/TIP.2022.3187565</identifier><identifier>PMID: 35830407</identifier><identifier>CODEN: IIPRE4</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Benchmark testing ; Benchmarks ; Feature extraction ; Feature maps ; Image annotation ; image matching ; Image segmentation ; knowledge distillation ; Learning ; Local feature ; Localization ; Location awareness ; Long term ; Machine learning ; Matching ; Night ; Seasonal variations ; Semantics ; Three-dimensional displays ; Visual discrimination ; visual localization ; Visualization</subject><ispartof>IEEE transactions on image processing, 2022, Vol.31, p.4842-4855</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2022</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c324t-d13fad363bb4b1d952347e84d1d37a455db6db1f464f8b7a0a0c02b3f734267c3</citedby><cites>FETCH-LOGICAL-c324t-d13fad363bb4b1d952347e84d1d37a455db6db1f464f8b7a0a0c02b3f734267c3</cites><orcidid>0000-0001-9834-4087 ; 0000-0001-9830-3955 ; 0000-0002-1155-467X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9829199$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,777,781,793,4010,27904,27905,27906,54739</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9829199$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Fan, Bin</creatorcontrib><creatorcontrib>Zhou, Junjie</creatorcontrib><creatorcontrib>Feng, Wensen</creatorcontrib><creatorcontrib>Pu, Huayan</creatorcontrib><creatorcontrib>Yang, Yuzhu</creatorcontrib><creatorcontrib>Kong, Qingqun</creatorcontrib><creatorcontrib>Wu, Fuchao</creatorcontrib><creatorcontrib>Liu, Hongmin</creatorcontrib><title>Learning Semantic-Aware Local Features for Long Term Visual Localization</title><title>IEEE transactions on image processing</title><addtitle>TIP</addtitle><description>Extracting robust and discriminative local features from images plays a vital role for long term visual localization, whose challenges are mainly caused by the severe appearance differences between matching images due to the day-night illuminations, seasonal changes, and human activities. Existing solutions resort to jointly learning both keypoints and their descriptors in an end-to-end manner, leveraged on large number of annotations of point correspondence which are harvested from the structure from motion and depth estimation algorithms. While these methods show improved performance over non-deep methods or those two-stage deep methods, i.e. , detection and then description, they are still struggled to conquer the problems encountered in long term visual localization. Since the intrinsic semantics are invariant to the local appearance changes, this paper proposes to learn semantic-aware local features in order to improve robustness of local feature matching for long term localization. Based on a state of the art CNN architecture for local feature learning, i.e. , ASLFeat, this paper leverages on the semantic information from an off-the-shelf semantic segmentation network to learn semantic-aware feature maps. The learned correspondence-aware feature descriptors and semantic features are then merged to form the final feature descriptors, for which the improved feature matching ability has been observed in experiments. In addition, the learned semantics embedded in the features can be further used to filter out noisy keypoints, leading to additional accuracy improvement and faster matching speed. Experiments on two popular long term visual localization benchmarks (Aachen Day and Night v1.1, Robotcar Seasons) and one challenging indoor benchmark (InLoc) demonstrate encouraging improvements of the localization accuracy over its counterpart and other competitive methods.</description><subject>Algorithms</subject><subject>Benchmark testing</subject><subject>Benchmarks</subject><subject>Feature extraction</subject><subject>Feature maps</subject><subject>Image annotation</subject><subject>image matching</subject><subject>Image segmentation</subject><subject>knowledge distillation</subject><subject>Learning</subject><subject>Local feature</subject><subject>Localization</subject><subject>Location awareness</subject><subject>Long term</subject><subject>Machine learning</subject><subject>Matching</subject><subject>Night</subject><subject>Seasonal variations</subject><subject>Semantics</subject><subject>Three-dimensional displays</subject><subject>Visual discrimination</subject><subject>visual localization</subject><subject>Visualization</subject><issn>1057-7149</issn><issn>1941-0042</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkE1LAzEQhoMotlbvgpcFL162ziTZZHMsxdpCQcHqdcnuZiVlP2qyi-ivN7XFg6cZZp53GB5CrhGmiKDuN6vnKQVKpwxTmYjkhIxRcYwBOD0NPSQylsjViFx4vwVAnqA4JyOWpAw4yDFZro12rW3foxfT6La3RTz71M5E667QdbQwuh-c8VHVuTAK2Ma4JnqzfgjbX8Z-69527SU5q3TtzdWxTsjr4mEzX8brp8fVfLaOC0Z5H5fIKl0ywfKc51iqhDIuTcpLLJnUPEnKXJQ5VlzwKs2lBg0F0JxVknEqZMEm5O5wd-e6j8H4PmusL0xd69Z0g8-oSBUIRCUDevsP3XaDa8N3gVKYCq4EBAoOVOE6752psp2zjXZfGUK2t5wFy9necna0HCI3h4g1xvzhKqUKlWI_gdJ1Dg</recordid><startdate>2022</startdate><enddate>2022</enddate><creator>Fan, Bin</creator><creator>Zhou, Junjie</creator><creator>Feng, Wensen</creator><creator>Pu, Huayan</creator><creator>Yang, Yuzhu</creator><creator>Kong, Qingqun</creator><creator>Wu, Fuchao</creator><creator>Liu, Hongmin</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0001-9834-4087</orcidid><orcidid>https://orcid.org/0000-0001-9830-3955</orcidid><orcidid>https://orcid.org/0000-0002-1155-467X</orcidid></search><sort><creationdate>2022</creationdate><title>Learning Semantic-Aware Local Features for Long Term Visual Localization</title><author>Fan, Bin ; Zhou, Junjie ; Feng, Wensen ; Pu, Huayan ; Yang, Yuzhu ; Kong, Qingqun ; Wu, Fuchao ; Liu, Hongmin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c324t-d13fad363bb4b1d952347e84d1d37a455db6db1f464f8b7a0a0c02b3f734267c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Benchmark testing</topic><topic>Benchmarks</topic><topic>Feature extraction</topic><topic>Feature maps</topic><topic>Image annotation</topic><topic>image matching</topic><topic>Image segmentation</topic><topic>knowledge distillation</topic><topic>Learning</topic><topic>Local feature</topic><topic>Localization</topic><topic>Location awareness</topic><topic>Long term</topic><topic>Machine learning</topic><topic>Matching</topic><topic>Night</topic><topic>Seasonal variations</topic><topic>Semantics</topic><topic>Three-dimensional displays</topic><topic>Visual discrimination</topic><topic>visual localization</topic><topic>Visualization</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Fan, Bin</creatorcontrib><creatorcontrib>Zhou, Junjie</creatorcontrib><creatorcontrib>Feng, Wensen</creatorcontrib><creatorcontrib>Pu, Huayan</creatorcontrib><creatorcontrib>Yang, Yuzhu</creatorcontrib><creatorcontrib>Kong, Qingqun</creatorcontrib><creatorcontrib>Wu, Fuchao</creatorcontrib><creatorcontrib>Liu, Hongmin</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on image processing</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Fan, Bin</au><au>Zhou, Junjie</au><au>Feng, Wensen</au><au>Pu, Huayan</au><au>Yang, Yuzhu</au><au>Kong, Qingqun</au><au>Wu, Fuchao</au><au>Liu, Hongmin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Learning Semantic-Aware Local Features for Long Term Visual Localization</atitle><jtitle>IEEE transactions on image processing</jtitle><stitle>TIP</stitle><date>2022</date><risdate>2022</risdate><volume>31</volume><spage>4842</spage><epage>4855</epage><pages>4842-4855</pages><issn>1057-7149</issn><eissn>1941-0042</eissn><coden>IIPRE4</coden><abstract>Extracting robust and discriminative local features from images plays a vital role for long term visual localization, whose challenges are mainly caused by the severe appearance differences between matching images due to the day-night illuminations, seasonal changes, and human activities. Existing solutions resort to jointly learning both keypoints and their descriptors in an end-to-end manner, leveraged on large number of annotations of point correspondence which are harvested from the structure from motion and depth estimation algorithms. While these methods show improved performance over non-deep methods or those two-stage deep methods, i.e. , detection and then description, they are still struggled to conquer the problems encountered in long term visual localization. Since the intrinsic semantics are invariant to the local appearance changes, this paper proposes to learn semantic-aware local features in order to improve robustness of local feature matching for long term localization. Based on a state of the art CNN architecture for local feature learning, i.e. , ASLFeat, this paper leverages on the semantic information from an off-the-shelf semantic segmentation network to learn semantic-aware feature maps. The learned correspondence-aware feature descriptors and semantic features are then merged to form the final feature descriptors, for which the improved feature matching ability has been observed in experiments. In addition, the learned semantics embedded in the features can be further used to filter out noisy keypoints, leading to additional accuracy improvement and faster matching speed. Experiments on two popular long term visual localization benchmarks (Aachen Day and Night v1.1, Robotcar Seasons) and one challenging indoor benchmark (InLoc) demonstrate encouraging improvements of the localization accuracy over its counterpart and other competitive methods.</abstract><cop>New York</cop><pub>IEEE</pub><pmid>35830407</pmid><doi>10.1109/TIP.2022.3187565</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0001-9834-4087</orcidid><orcidid>https://orcid.org/0000-0001-9830-3955</orcidid><orcidid>https://orcid.org/0000-0002-1155-467X</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 1057-7149
ispartof	IEEE transactions on image processing, 2022, Vol.31, p.4842-4855
issn	1057-7149 1941-0042
language	eng
recordid	cdi_proquest_journals_2691864960
source	IEEE Electronic Library (IEL)
subjects	Algorithms Benchmark testing Benchmarks Feature extraction Feature maps Image annotation image matching Image segmentation knowledge distillation Learning Local feature Localization Location awareness Long term Machine learning Matching Night Seasonal variations Semantics Three-dimensional displays Visual discrimination visual localization Visualization
title	Learning Semantic-Aware Local Features for Long Term Visual Localization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T12%3A32%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Learning%20Semantic-Aware%20Local%20Features%20for%20Long%20Term%20Visual%20Localization&rft.jtitle=IEEE%20transactions%20on%20image%20processing&rft.au=Fan,%20Bin&rft.date=2022&rft.volume=31&rft.spage=4842&rft.epage=4855&rft.pages=4842-4855&rft.issn=1057-7149&rft.eissn=1941-0042&rft.coden=IIPRE4&rft_id=info:doi/10.1109/TIP.2022.3187565&rft_dat=%3Cproquest_RIE%3E2691864960%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2691864960&rft_id=info:pmid/35830407&rft_ieee_id=9829199&rfr_iscdi=true