NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization

This paper presents an end-to-end neural mapping method for camera localization, dubbed NeuMap, encoding a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels. State-of-the-art feature matching methods require each scene to be...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Tang, Shitao, Tang, Sicong, Tagliasacchi, Andrea, Tan, Ping, Furukawa, Yasutaka
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Tang, Shitao
Tang, Sicong
Tagliasacchi, Andrea
Tan, Ping
Furukawa, Yasutaka
description This paper presents an end-to-end neural mapping method for camera localization, dubbed NeuMap, encoding a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels. State-of-the-art feature matching methods require each scene to be stored as a 3D point cloud with per-point features, consuming several gigabytes of storage per scene. While compression is possible, performance drops significantly at high compression rates. Conversely, coordinate regression methods achieve high compression by storing scene information in a neural network but suffer from reduced robustness. NeuMap combines the advantages of both approaches by utilizing 1) learnable latent codes for efficient scene representation and 2) a scene-agnostic Transformer-based auto-decoder to infer coordinates for query pixels. This scene-agnostic network design learns robust matching priors from large-scale data and enables rapid optimization of codes for new scenes while keeping the network weights fixed. Extensive evaluations on five benchmarks show that NeuMap significantly outperforms other coordinate regression methods and achieves comparable performance to feature matching methods while requiring a much smaller scene representation size. For example, NeuMap achieves 39.1% accuracy in the Aachen night benchmark with only 6MB of data, whereas alternative methods require 100MB or several gigabytes and fail completely under high compression settings. The codes are available at https://github.com/Tangshitao/NeuMap
doi_str_mv 10.48550/arxiv.2211.11177
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2211_11177</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2211_11177</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-4c1b4cab1bb3bd47d0383663e1e4279aca8d17da8acab1ffbe9b831d737341343</originalsourceid><addsrcrecordid>eNotj8tqwzAQRbXpoiT9gK6qH7Dr8SgeJbtg-gI33XhvRg8XgWMZxSlNv75J2tWBy-XAEeIeilzp1ap45PQdvvKyBMgBgOhW7Hb--M7TRp6ZeJB1jMmFkWcvz_MUxk9pTnJ7nGPWJh4PztvofJJ9TLLmvU8sm2h5CD88hzguxU3Pw8Hf_XMh2uentn7Nmo-Xt3rbZFwRZcqCUZYNGIPGKXIFaqwq9OBVSWu2rB2QY82XU98bvzYawRESKkCFC_Hwp70GdVMKe06n7hLWXcPwF4bHSSI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization</title><source>arXiv.org</source><creator>Tang, Shitao ; Tang, Sicong ; Tagliasacchi, Andrea ; Tan, Ping ; Furukawa, Yasutaka</creator><creatorcontrib>Tang, Shitao ; Tang, Sicong ; Tagliasacchi, Andrea ; Tan, Ping ; Furukawa, Yasutaka</creatorcontrib><description>This paper presents an end-to-end neural mapping method for camera localization, dubbed NeuMap, encoding a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels. State-of-the-art feature matching methods require each scene to be stored as a 3D point cloud with per-point features, consuming several gigabytes of storage per scene. While compression is possible, performance drops significantly at high compression rates. Conversely, coordinate regression methods achieve high compression by storing scene information in a neural network but suffer from reduced robustness. NeuMap combines the advantages of both approaches by utilizing 1) learnable latent codes for efficient scene representation and 2) a scene-agnostic Transformer-based auto-decoder to infer coordinates for query pixels. This scene-agnostic network design learns robust matching priors from large-scale data and enables rapid optimization of codes for new scenes while keeping the network weights fixed. Extensive evaluations on five benchmarks show that NeuMap significantly outperforms other coordinate regression methods and achieves comparable performance to feature matching methods while requiring a much smaller scene representation size. For example, NeuMap achieves 39.1% accuracy in the Aachen night benchmark with only 6MB of data, whereas alternative methods require 100MB or several gigabytes and fail completely under high compression settings. The codes are available at https://github.com/Tangshitao/NeuMap</description><identifier>DOI: 10.48550/arxiv.2211.11177</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2022-11</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2211.11177$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2211.11177$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Tang, Shitao</creatorcontrib><creatorcontrib>Tang, Sicong</creatorcontrib><creatorcontrib>Tagliasacchi, Andrea</creatorcontrib><creatorcontrib>Tan, Ping</creatorcontrib><creatorcontrib>Furukawa, Yasutaka</creatorcontrib><title>NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization</title><description>This paper presents an end-to-end neural mapping method for camera localization, dubbed NeuMap, encoding a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels. State-of-the-art feature matching methods require each scene to be stored as a 3D point cloud with per-point features, consuming several gigabytes of storage per scene. While compression is possible, performance drops significantly at high compression rates. Conversely, coordinate regression methods achieve high compression by storing scene information in a neural network but suffer from reduced robustness. NeuMap combines the advantages of both approaches by utilizing 1) learnable latent codes for efficient scene representation and 2) a scene-agnostic Transformer-based auto-decoder to infer coordinates for query pixels. This scene-agnostic network design learns robust matching priors from large-scale data and enables rapid optimization of codes for new scenes while keeping the network weights fixed. Extensive evaluations on five benchmarks show that NeuMap significantly outperforms other coordinate regression methods and achieves comparable performance to feature matching methods while requiring a much smaller scene representation size. For example, NeuMap achieves 39.1% accuracy in the Aachen night benchmark with only 6MB of data, whereas alternative methods require 100MB or several gigabytes and fail completely under high compression settings. The codes are available at https://github.com/Tangshitao/NeuMap</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqwzAQRbXpoiT9gK6qH7Dr8SgeJbtg-gI33XhvRg8XgWMZxSlNv75J2tWBy-XAEeIeilzp1ap45PQdvvKyBMgBgOhW7Hb--M7TRp6ZeJB1jMmFkWcvz_MUxk9pTnJ7nGPWJh4PztvofJJ9TLLmvU8sm2h5CD88hzguxU3Pw8Hf_XMh2uentn7Nmo-Xt3rbZFwRZcqCUZYNGIPGKXIFaqwq9OBVSWu2rB2QY82XU98bvzYawRESKkCFC_Hwp70GdVMKe06n7hLWXcPwF4bHSSI</recordid><startdate>20221120</startdate><enddate>20221120</enddate><creator>Tang, Shitao</creator><creator>Tang, Sicong</creator><creator>Tagliasacchi, Andrea</creator><creator>Tan, Ping</creator><creator>Furukawa, Yasutaka</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221120</creationdate><title>NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization</title><author>Tang, Shitao ; Tang, Sicong ; Tagliasacchi, Andrea ; Tan, Ping ; Furukawa, Yasutaka</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-4c1b4cab1bb3bd47d0383663e1e4279aca8d17da8acab1ffbe9b831d737341343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Tang, Shitao</creatorcontrib><creatorcontrib>Tang, Sicong</creatorcontrib><creatorcontrib>Tagliasacchi, Andrea</creatorcontrib><creatorcontrib>Tan, Ping</creatorcontrib><creatorcontrib>Furukawa, Yasutaka</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tang, Shitao</au><au>Tang, Sicong</au><au>Tagliasacchi, Andrea</au><au>Tan, Ping</au><au>Furukawa, Yasutaka</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization</atitle><date>2022-11-20</date><risdate>2022</risdate><abstract>This paper presents an end-to-end neural mapping method for camera localization, dubbed NeuMap, encoding a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels. State-of-the-art feature matching methods require each scene to be stored as a 3D point cloud with per-point features, consuming several gigabytes of storage per scene. While compression is possible, performance drops significantly at high compression rates. Conversely, coordinate regression methods achieve high compression by storing scene information in a neural network but suffer from reduced robustness. NeuMap combines the advantages of both approaches by utilizing 1) learnable latent codes for efficient scene representation and 2) a scene-agnostic Transformer-based auto-decoder to infer coordinates for query pixels. This scene-agnostic network design learns robust matching priors from large-scale data and enables rapid optimization of codes for new scenes while keeping the network weights fixed. Extensive evaluations on five benchmarks show that NeuMap significantly outperforms other coordinate regression methods and achieves comparable performance to feature matching methods while requiring a much smaller scene representation size. For example, NeuMap achieves 39.1% accuracy in the Aachen night benchmark with only 6MB of data, whereas alternative methods require 100MB or several gigabytes and fail completely under high compression settings. The codes are available at https://github.com/Tangshitao/NeuMap</abstract><doi>10.48550/arxiv.2211.11177</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2211.11177
ispartof
issn
language eng
recordid cdi_arxiv_primary_2211_11177
source arXiv.org
subjects Computer Science - Computer Vision and Pattern Recognition
title NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T23%3A55%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=NeuMap:%20Neural%20Coordinate%20Mapping%20by%20Auto-Transdecoder%20for%20Camera%20Localization&rft.au=Tang,%20Shitao&rft.date=2022-11-20&rft_id=info:doi/10.48550/arxiv.2211.11177&rft_dat=%3Carxiv_GOX%3E2211_11177%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true