UAV-YOLOv5: A Swin-Transformer-Enabled Small Object Detection Model for Long-Range UAV Images

This paper tackle the challenges associated with low recognition accuracy and the detection of occlusions when identifying long-range and diminutive targets (such as UAVs). We introduce a sophisticated detection framework named UAV-YOLOv5, which amalgamates the strengths of Swin Transformer V2 and Y...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Annals of data science 2024-08, Vol.11 (4), p.1109-1138
Hauptverfasser:	Li, Jun, Xie, Chong, Wu, Sizheng, Ren, Yawei
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Algorithms Artificial Intelligence Business and Management Datasets Economics Feature extraction Finance Insurance Management Modules Object recognition Optoelectronics Statistics for Business Target acquisition Target detection Transformers Unmanned aerial vehicles
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper tackle the challenges associated with low recognition accuracy and the detection of occlusions when identifying long-range and diminutive targets (such as UAVs). We introduce a sophisticated detection framework named UAV-YOLOv5, which amalgamates the strengths of Swin Transformer V2 and YOLOv5. Firstly, we introduce Focal-EIOU, a refinement of the K-means algorithm tailored to generate anchor boxes better suited for the current dataset, thereby improving detection performance. Second, the convolutional and pooling layers in the network with step size greater than 1 are replaced to prevent information loss during feature extraction. Then, the Swin Transformer V2 module is introduced in the Neck to improve the accuracy of the model, and the BiFormer module is introduced to improve the ability of the model to acquire global and local feature information at the same time. In addition, BiFPN is introduced to replace the original FPN structure so that the network can acquire richer semantic information and fuse features across scales more effectively. Lastly, a small target detection head is appended to the existing architecture, augmenting the model’s proficiency in detecting smaller targets with heightened precision. Furthermore, various experiments are conducted on the comprehensive dataset to verify the effectiveness of UAV-YOLOv5, achieving an average accuracy of 87%. Compared with YOLOv5, the mAP of UAV-YOLOv5 is improved by 8.5%, which verifies that it has high-precision long-range small-target UAV optoelectronic detection capability.
ISSN:	2198-5804 2198-5812
DOI:	10.1007/s40745-024-00546-z