SCNet: A Lightweight and Efficient Object Detection Network for Remote Sensing

Detecting small objects in remote sensing images is meaningful challenging, especially when deploying existing object detection models on edge terminal devices with limited hardware resources. In this study, we present an efficient remote sensing object detection model named SCNet, based on the ultr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE geoscience and remote sensing letters 2024, Vol.21, p.1-5
Hauptverfasser:	Zhu, Shiliang, Miao, Min
Format:	Artikel
Sprache:	eng
Schlagworte:	Batch flotation Context modeling Data mining Detection Feature extraction Feature maps Floating point arithmetic Information processing Information retrieval Lightweight Lightweight model Modelling Neck object detection Object recognition Remote sensing Semantics YOLO you only look once (YOLO)
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Detecting small objects in remote sensing images is meaningful challenging, especially when deploying existing object detection models on edge terminal devices with limited hardware resources. In this study, we present an efficient remote sensing object detection model named SCNet, based on the ultra-lightweight (you only look once, YOLOv5n). To address the significant feature loss issue in small objects within the model's neck, we introduce the selective feature enhancement block (SFEB). The SFEB selectively processes a portion of feature maps that contribute more to semantic information extraction while retaining another portion, enabling us to extract rich semantic information while preserving crucial details information necessary for small object detection. Furthermore, we incorporate the contextual transformer block (CTB) at the neck and backbone junction, which enhances the model's ability to understand relationships and boundaries between objects and backgrounds by exploring contextual information in shallow-level feature maps. This improves the model's capability to detect challenging small and medium objects. Experimental results on the NWPU VHR-10 and DIOR datasets demonstrate the model's performance, achieving mean average precisions (mAPs) of 96.6% and 72.6% at IOU = 0.5. The model operates at 487 frames/s with a batch size of 32 (FPS32), requiring only 4.6 giga floating-point operations per second (GFLOPs) and 1.8 million params.
ISSN:	1545-598X 1558-0571
DOI:	10.1109/LGRS.2023.3344937