GeoFormer: An Effective Transformer-Based Siamese Network for UAV Geolocalization

Cross-view geolocalization of unmanned aerial vehicles (UAVs) is a challenging task due to the positional discrepancies and uncertainties in scale and distance between UAVs and satellite views. Existing transformer-based geolocalization methods mainly use encoders to mine image contextual informatio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE journal of selected topics in applied earth observations and remote sensing 2024, Vol.17, p.9470-9491
Hauptverfasser:	Li, Qingge, Yang, Xiaogang, Fan, Jiwei, Lu, Ruitao, Tang, Bin, Wang, Siyu, Su, Shuang
Format:	Artikel
Sprache:	eng
Schlagworte:	Aggregation Algorithms Artificial neural networks Autonomous aerial vehicles Clustering Computational efficiency Computer applications Computing costs Cross-view image retrieval Feature extraction Feature maps Feature recognition heterologous scene matching linear attention Location awareness Matching Modules Neural networks Representations Rotation Satellite imagery Satellites Semantics Siamese network Task analysis transformer Transformers unmanned aerial vehicle (UAV) geolocalization Unmanned aerial vehicles
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Cross-view geolocalization of unmanned aerial vehicles (UAVs) is a challenging task due to the positional discrepancies and uncertainties in scale and distance between UAVs and satellite views. Existing transformer-based geolocalization methods mainly use encoders to mine image contextual information. However, these methods have some limitations when dealing with scale changes in cross-view images. Therefore, we present an effective transformer-based Siamese network tailored for UAV geolocalization, called GeoFormer. First, an efficient transformer feature extraction network was designed, which utilizes linear attention to reduce the computational complexity and improve the computational efficiency of the network. Among them, we designed an efficient separable perceptron module based on depthwise separable convolution, which can effectively reduce the computational cost while improving the feature representation of the network. Second, we proposed a multiscale feature aggregation module, which deeply fuses salient features at different scales through a feedforward neural network to generate global feature representations with rich semantics, which improves the model's ability to capture image details and represent robust features. Additionally, we designed a semantic-guided region segmentation module, which utilizes a k -modes clustering algorithm to divide the feature map into multiple regions with semantic consistency and performs feature recognition within each semantic region to improve the accuracy of image matching. Finally, we designed a hierarchical reinforcement rotation matching strategy to achieve accurate UAV geolocalization based on the retrieval results of UAV view query satellite images using SuperPoint keypoints extraction and LightGlue rotation matching. According to the experimental results, our method effectively achieves UAV geolocalization.
ISSN:	1939-1404 2151-1535
DOI:	10.1109/JSTARS.2024.3392812