Small object detection in unmanned aerial vehicle images using multi-scale hybrid attention

Small object detection in unmanned aerial vehicle images is always challenging due to the low resolution and the limited amount of information that they contain. Many feature enhancement effects have been introduced to improve the detection of small objects, but the extracted effective information i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering applications of artificial intelligence 2024-02, Vol.128, p.107455, Article 107455
Hauptverfasser: Song, Gang, Du, Hongwei, Zhang, Xinyue, Bao, Fangxun, Zhang, Yunfeng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Small object detection in unmanned aerial vehicle images is always challenging due to the low resolution and the limited amount of information that they contain. Many feature enhancement effects have been introduced to improve the detection of small objects, but the extracted effective information is still insufficient, and redundant information interference is an issue. In this paper, we propose a new multi-scale hybrid attention based detector (MHA-YOLOv5), which integrates the similarity relationships between objects into you only look once version 5 (YOLOv5) for small object detection. Specifically, a novel multi-scale hybrid attention (MHA) structure is proposed to enhance the feature representation of small objects. This structure contains three modules: multi-scale attention (MsA), foreground enhancement module (FEM) and depthwise separable channel attention (DSCA). The MsA module is designed to build connections between large objects with abundant details and small objects with insufficient features on multiple scale features and capture the similarity relationships between objects. To reduce the interference of redundant information, the FEM is used to focus on the foreground features of multiple scale features, and the DSCA module is utilized to effectively extract multidimensional channel information. Sufficient experiments on the challenging VisDrone2019-DET, UAVDT and CARPK datasets demonstrate the effectiveness and superiority of the proposed approach. Specifically, compared with the performance of YOLOv5, MHA-YOLOv5 demonstrates a 2.82% mean average precision (mAP) improvement on the VisDrone2019-DET dataset, a 2.25% mAP improvement on the UAVDT dataset, and a 3.07% mAP improvement on the CARPK dataset. •Similarity relationships between objects are important supplementary information.•The MHA structure is innovatively designed to enhance the feature representation.•The correlation of different scale features is established.•Foreground and channel information of multiple scale features are considered.
ISSN:0952-1976
1873-6769
DOI:10.1016/j.engappai.2023.107455