Cross-Modal Oriented Object Detection of UAV Aerial Images Based on Image Feature

Arbitrary-oriented object detection is vital for improving unmanned aerial vehicle (UAV) sensing and has promising applications. However, challenges persist in detecting objects under extreme conditions like low-illumination and strong occlusion. Cross-modal feature fusion enhances detection in comp...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-21
Hauptverfasser: Wang, Huiying, Wang, Chunping, Fu, Qiang, Zhang, Dongdong, Kou, Renke, Yu, Ying, Song, Jian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Arbitrary-oriented object detection is vital for improving unmanned aerial vehicle (UAV) sensing and has promising applications. However, challenges persist in detecting objects under extreme conditions like low-illumination and strong occlusion. Cross-modal feature fusion enhances detection in complex environments but current methods do not adequately learn the features of each modality for the current environment, resulting in degraded performance. To tackle this, we propose the cross-modal aerial remote sensing image object detection (CRSIOD) network that effectively learns diverse sensor image features to capture distinct scenarios and target characteristics. First, we design an illumination perception module to guide the object detection network in performing various feature processing tasks. Second, to leverage the respective advantages of two modalities and mitigate their negative impacts, we introduce an uncertainty-aware module to quantify the uncertainties present in each modality as weights to motivate the network to learn in a direction favorable for optimal object detection. Moreover, in the object detection network, we design a two-stream backbone network based on the attention mechanism to enhance the learning of difficult samples, utilize the cross-modality attentive feature fusion (CMAFF) module to fully extract the shared and complementary features between the two modalities, and design a three-branch feature enhancement network to enhance the learning of the three modal features separately. Finally, to optimize detection results, we design light perception nonmaximum suppression (LP-NMS) and improve the horizontal detection head to a rotating one to preserve object orientation. We evaluate the proposed method CRSIOD on the Drone Vehicle dataset of public UAV aerial images. Compared with the existing commonly used methods, CRSIOD achieves state-of-the-art detection performance.
ISSN:0196-2892
1558-0644
DOI:10.1109/TGRS.2024.3367934