SNAP: Self-Supervised Neural Maps for Visual Positioning and Semantic Understanding
Semantic 2D maps are commonly used by humans and machines for navigation purposes, whether it's walking or driving. However, these maps have limitations: they lack detail, often contain inaccuracies, and are difficult to create and maintain, especially in an automated fashion. Can we use raw im...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Semantic 2D maps are commonly used by humans and machines for navigation
purposes, whether it's walking or driving. However, these maps have
limitations: they lack detail, often contain inaccuracies, and are difficult to
create and maintain, especially in an automated fashion. Can we use raw imagery
to automatically create better maps that can be easily interpreted by both
humans and machines? We introduce SNAP, a deep network that learns rich neural
2D maps from ground-level and overhead images. We train our model to align
neural maps estimated from different inputs, supervised only with camera poses
over tens of millions of StreetView images. SNAP can resolve the location of
challenging image queries beyond the reach of traditional methods,
outperforming the state of the art in localization by a large margin. Moreover,
our neural maps encode not only geometry and appearance but also high-level
semantics, discovered without explicit supervision. This enables effective
pre-training for data-efficient semantic scene understanding, with the
potential to unlock cost-efficient creation of more detailed maps. |
---|---|
DOI: | 10.48550/arxiv.2306.05407 |