Multi-Scale Fusion Uncrewed Aerial Vehicle Detection Based on RT-DETR

With the rapid development of science and technology, uncrewed aerial vehicle (UAV) technology has shown a wide range of application prospects in various fields. The accuracy and real-time performance of UAV target detection play a vital role in ensuring safety and improving the work efficiency of U...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Electronics (Basel) 2024-04, Vol.13 (8), p.1489
Hauptverfasser: Zhu, Minling, Kong, En
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:With the rapid development of science and technology, uncrewed aerial vehicle (UAV) technology has shown a wide range of application prospects in various fields. The accuracy and real-time performance of UAV target detection play a vital role in ensuring safety and improving the work efficiency of UAVs. Aimed at the challenges faced by the current UAV detection field, this paper proposes the Gathering Cascaded Dilated DETR (GCD-DETR) model, which aims to improve the accuracy and efficiency of UAV target detection. The main innovations of this paper are as follows: (1) The Dilated Re-param Block is creatively applied to the dilatation-wise Residual module, which uses the large kernel convolution and the parallel small kernel convolution together and fuses the feature maps generated by multi-scale perception, greatly improving the feature extraction ability, thereby improving the accuracy of UAV detection. (2) The Gather-and-Distribute mechanism is introduced to effectively enhance the ability of multi-scale feature fusion so that the model can make full use of the feature information extracted from the backbone network and further improve the detection performance. (3) The Cascaded Group Attention mechanism is innovatively introduced, which not only saves the computational cost but also improves the diversity of attention by dividing the attention head in different ways, thus enhancing the ability of the model to process complex scenes. In order to verify the effectiveness of the proposed model, this paper conducts experiments on multiple UAV datasets of complex scenes. The experimental results show that the accuracy of the improved RT-DETR model proposed in this paper on the two UAV datasets reaches 0.956 and 0.978, respectively, which is 2% and 1.1% higher than that of the original RT-DETR model. At the same time, the FPS of the model is also improved by 10 frames per second, which achieves an effective balance between accuracy and speed.
ISSN:2079-9292
2079-9292
DOI:10.3390/electronics13081489