A Transformer-Based Adaptive Semantic Aggregation Method for UAV Visual Geo-Localization
published by Chinese Conference on Pattern Recognition and Computer Vision (PRCV) 2023 This paper addresses the task of Unmanned Aerial Vehicles (UAV) visual geo-localization, which aims to match images of the same geographic target taken by different platforms, i.e., UAVs and satellites. In general...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | published by Chinese Conference on Pattern Recognition and
Computer Vision (PRCV) 2023 This paper addresses the task of Unmanned Aerial Vehicles (UAV) visual
geo-localization, which aims to match images of the same geographic target
taken by different platforms, i.e., UAVs and satellites. In general, the key to
achieving accurate UAV-satellite image matching lies in extracting visual
features that are robust against viewpoint changes, scale variations, and
rotations. Current works have shown that part matching is crucial for UAV
visual geo-localization since part-level representations can capture image
details and help to understand the semantic information of scenes. However, the
importance of preserving semantic characteristics in part-level representations
is not well discussed. In this paper, we introduce a transformer-based adaptive
semantic aggregation method that regards parts as the most representative
semantics in an image. Correlations of image patches to different parts are
learned in terms of the transformer's feature map. Then our method decomposes
part-level features into an adaptive sum of all patch features. By doing this,
the learned parts are encouraged to focus on patches with typical semantics.
Extensive experiments on the University-1652 dataset have shown the superiority
of our method over the current works. |
---|---|
DOI: | 10.48550/arxiv.2401.01574 |