AGCosPlace: A UAV Visual Positioning Algorithm Based on Transformer

To address the limitation and obtain the position of the drone even when the relative poses and intrinsics of the drone camera are unknown, a visual positioning algorithm based on image retrieval called AGCosPlace, which leverages the Transformer architecture to achieve improved performance, is prop...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Drones (Basel) 2023-08, Vol.7 (8), p.498
Hauptverfasser:	Guo, Ya, Zhou, Yatong, Yang, Fan
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Border patrol Cameras Coding Computer networks Control systems Datasets Drone aircraft Environmental monitoring Feature maps Global positioning systems GPS graph network Ground stations Image retrieval Labels Localization Modules Multilayer perceptrons Multilayers Navigation systems Neural networks Sensors Surveillance Test sets transformer UAV visual navigation Unmanned aerial vehicles visual positioning Visual tasks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	To address the limitation and obtain the position of the drone even when the relative poses and intrinsics of the drone camera are unknown, a visual positioning algorithm based on image retrieval called AGCosPlace, which leverages the Transformer architecture to achieve improved performance, is proposed. Our approach involves subjecting the feature map of the backbone to an encoding operation that incorporates attention mechanisms, multi-layer perceptron coding, and a graph network module. This encoding operation allows for better aggregation of the context information present in the image. Subsequently, the aggregation module with dynamic adaptive pooling produces a descriptor with an appropriate dimensionality, which is then passed into the classifier to recognize the position. Considering the complexity associated with labeling visual positioning labels for UAV images, the visual positioning network is trained using the publicly available Google Street View SF-XL dataset. The performance of the trained network model on a custom UAV perspective test set is evaluated. The experimental results demonstrate that our proposed algorithm, which improves upon the ResNet backbone networks on the SF-XL test set, exhibits excellent performance on the UAV test set. The algorithm achieves notable improvements in the four evaluation metrics: R@1, R@5, R@10, and R@20. These results confirm that the trained visual positioning network can effectively be employed in UAV visual positioning tasks.
ISSN:	2504-446X 2504-446X
DOI:	10.3390/drones7080498