NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization
This paper presents an end-to-end neural mapping method for camera localization, dubbed NeuMap, encoding a whole scene into a grid of latent codes, with which a Transformer-based auto-decoder regresses 3D coordinates of query pixels. State-of-the-art feature matching methods require each scene to be...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Tang, Shitao Tang, Sicong Tagliasacchi, Andrea Tan, Ping Furukawa, Yasutaka |
description | This paper presents an end-to-end neural mapping method for camera
localization, dubbed NeuMap, encoding a whole scene into a grid of latent
codes, with which a Transformer-based auto-decoder regresses 3D coordinates of
query pixels. State-of-the-art feature matching methods require each scene to
be stored as a 3D point cloud with per-point features, consuming several
gigabytes of storage per scene. While compression is possible, performance
drops significantly at high compression rates. Conversely, coordinate
regression methods achieve high compression by storing scene information in a
neural network but suffer from reduced robustness. NeuMap combines the
advantages of both approaches by utilizing 1) learnable latent codes for
efficient scene representation and 2) a scene-agnostic Transformer-based
auto-decoder to infer coordinates for query pixels. This scene-agnostic network
design learns robust matching priors from large-scale data and enables rapid
optimization of codes for new scenes while keeping the network weights fixed.
Extensive evaluations on five benchmarks show that NeuMap significantly
outperforms other coordinate regression methods and achieves comparable
performance to feature matching methods while requiring a much smaller scene
representation size. For example, NeuMap achieves 39.1% accuracy in the Aachen
night benchmark with only 6MB of data, whereas alternative methods require
100MB or several gigabytes and fail completely under high compression settings.
The codes are available at https://github.com/Tangshitao/NeuMap |
doi_str_mv | 10.48550/arxiv.2211.11177 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2211_11177</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2211_11177</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-4c1b4cab1bb3bd47d0383663e1e4279aca8d17da8acab1ffbe9b831d737341343</originalsourceid><addsrcrecordid>eNotj8tqwzAQRbXpoiT9gK6qH7Dr8SgeJbtg-gI33XhvRg8XgWMZxSlNv75J2tWBy-XAEeIeilzp1ap45PQdvvKyBMgBgOhW7Hb--M7TRp6ZeJB1jMmFkWcvz_MUxk9pTnJ7nGPWJh4PztvofJJ9TLLmvU8sm2h5CD88hzguxU3Pw8Hf_XMh2uentn7Nmo-Xt3rbZFwRZcqCUZYNGIPGKXIFaqwq9OBVSWu2rB2QY82XU98bvzYawRESKkCFC_Hwp70GdVMKe06n7hLWXcPwF4bHSSI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization</title><source>arXiv.org</source><creator>Tang, Shitao ; Tang, Sicong ; Tagliasacchi, Andrea ; Tan, Ping ; Furukawa, Yasutaka</creator><creatorcontrib>Tang, Shitao ; Tang, Sicong ; Tagliasacchi, Andrea ; Tan, Ping ; Furukawa, Yasutaka</creatorcontrib><description>This paper presents an end-to-end neural mapping method for camera
localization, dubbed NeuMap, encoding a whole scene into a grid of latent
codes, with which a Transformer-based auto-decoder regresses 3D coordinates of
query pixels. State-of-the-art feature matching methods require each scene to
be stored as a 3D point cloud with per-point features, consuming several
gigabytes of storage per scene. While compression is possible, performance
drops significantly at high compression rates. Conversely, coordinate
regression methods achieve high compression by storing scene information in a
neural network but suffer from reduced robustness. NeuMap combines the
advantages of both approaches by utilizing 1) learnable latent codes for
efficient scene representation and 2) a scene-agnostic Transformer-based
auto-decoder to infer coordinates for query pixels. This scene-agnostic network
design learns robust matching priors from large-scale data and enables rapid
optimization of codes for new scenes while keeping the network weights fixed.
Extensive evaluations on five benchmarks show that NeuMap significantly
outperforms other coordinate regression methods and achieves comparable
performance to feature matching methods while requiring a much smaller scene
representation size. For example, NeuMap achieves 39.1% accuracy in the Aachen
night benchmark with only 6MB of data, whereas alternative methods require
100MB or several gigabytes and fail completely under high compression settings.
The codes are available at https://github.com/Tangshitao/NeuMap</description><identifier>DOI: 10.48550/arxiv.2211.11177</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2022-11</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2211.11177$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2211.11177$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Tang, Shitao</creatorcontrib><creatorcontrib>Tang, Sicong</creatorcontrib><creatorcontrib>Tagliasacchi, Andrea</creatorcontrib><creatorcontrib>Tan, Ping</creatorcontrib><creatorcontrib>Furukawa, Yasutaka</creatorcontrib><title>NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization</title><description>This paper presents an end-to-end neural mapping method for camera
localization, dubbed NeuMap, encoding a whole scene into a grid of latent
codes, with which a Transformer-based auto-decoder regresses 3D coordinates of
query pixels. State-of-the-art feature matching methods require each scene to
be stored as a 3D point cloud with per-point features, consuming several
gigabytes of storage per scene. While compression is possible, performance
drops significantly at high compression rates. Conversely, coordinate
regression methods achieve high compression by storing scene information in a
neural network but suffer from reduced robustness. NeuMap combines the
advantages of both approaches by utilizing 1) learnable latent codes for
efficient scene representation and 2) a scene-agnostic Transformer-based
auto-decoder to infer coordinates for query pixels. This scene-agnostic network
design learns robust matching priors from large-scale data and enables rapid
optimization of codes for new scenes while keeping the network weights fixed.
Extensive evaluations on five benchmarks show that NeuMap significantly
outperforms other coordinate regression methods and achieves comparable
performance to feature matching methods while requiring a much smaller scene
representation size. For example, NeuMap achieves 39.1% accuracy in the Aachen
night benchmark with only 6MB of data, whereas alternative methods require
100MB or several gigabytes and fail completely under high compression settings.
The codes are available at https://github.com/Tangshitao/NeuMap</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tqwzAQRbXpoiT9gK6qH7Dr8SgeJbtg-gI33XhvRg8XgWMZxSlNv75J2tWBy-XAEeIeilzp1ap45PQdvvKyBMgBgOhW7Hb--M7TRp6ZeJB1jMmFkWcvz_MUxk9pTnJ7nGPWJh4PztvofJJ9TLLmvU8sm2h5CD88hzguxU3Pw8Hf_XMh2uentn7Nmo-Xt3rbZFwRZcqCUZYNGIPGKXIFaqwq9OBVSWu2rB2QY82XU98bvzYawRESKkCFC_Hwp70GdVMKe06n7hLWXcPwF4bHSSI</recordid><startdate>20221120</startdate><enddate>20221120</enddate><creator>Tang, Shitao</creator><creator>Tang, Sicong</creator><creator>Tagliasacchi, Andrea</creator><creator>Tan, Ping</creator><creator>Furukawa, Yasutaka</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20221120</creationdate><title>NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization</title><author>Tang, Shitao ; Tang, Sicong ; Tagliasacchi, Andrea ; Tan, Ping ; Furukawa, Yasutaka</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-4c1b4cab1bb3bd47d0383663e1e4279aca8d17da8acab1ffbe9b831d737341343</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Tang, Shitao</creatorcontrib><creatorcontrib>Tang, Sicong</creatorcontrib><creatorcontrib>Tagliasacchi, Andrea</creatorcontrib><creatorcontrib>Tan, Ping</creatorcontrib><creatorcontrib>Furukawa, Yasutaka</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tang, Shitao</au><au>Tang, Sicong</au><au>Tagliasacchi, Andrea</au><au>Tan, Ping</au><au>Furukawa, Yasutaka</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization</atitle><date>2022-11-20</date><risdate>2022</risdate><abstract>This paper presents an end-to-end neural mapping method for camera
localization, dubbed NeuMap, encoding a whole scene into a grid of latent
codes, with which a Transformer-based auto-decoder regresses 3D coordinates of
query pixels. State-of-the-art feature matching methods require each scene to
be stored as a 3D point cloud with per-point features, consuming several
gigabytes of storage per scene. While compression is possible, performance
drops significantly at high compression rates. Conversely, coordinate
regression methods achieve high compression by storing scene information in a
neural network but suffer from reduced robustness. NeuMap combines the
advantages of both approaches by utilizing 1) learnable latent codes for
efficient scene representation and 2) a scene-agnostic Transformer-based
auto-decoder to infer coordinates for query pixels. This scene-agnostic network
design learns robust matching priors from large-scale data and enables rapid
optimization of codes for new scenes while keeping the network weights fixed.
Extensive evaluations on five benchmarks show that NeuMap significantly
outperforms other coordinate regression methods and achieves comparable
performance to feature matching methods while requiring a much smaller scene
representation size. For example, NeuMap achieves 39.1% accuracy in the Aachen
night benchmark with only 6MB of data, whereas alternative methods require
100MB or several gigabytes and fail completely under high compression settings.
The codes are available at https://github.com/Tangshitao/NeuMap</abstract><doi>10.48550/arxiv.2211.11177</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2211.11177 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2211_11177 |
source | arXiv.org |
subjects | Computer Science - Computer Vision and Pattern Recognition |
title | NeuMap: Neural Coordinate Mapping by Auto-Transdecoder for Camera Localization |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T23%3A55%3A21IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=NeuMap:%20Neural%20Coordinate%20Mapping%20by%20Auto-Transdecoder%20for%20Camera%20Localization&rft.au=Tang,%20Shitao&rft.date=2022-11-20&rft_id=info:doi/10.48550/arxiv.2211.11177&rft_dat=%3Carxiv_GOX%3E2211_11177%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |