PR-Deformable DETR: DETR for Remote Sensing Object Detection

Identifying objects in remote sensing images remains a critical challenge. However, remote sensing images typically encompass numerous small objects, significant variations in object sizes, and a dispersed distribution of objects, all of which pose challenges to the performance of existing object de...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE geoscience and remote sensing letters 2024, Vol.21, p.1-5
Hauptverfasser: Chen, Yuepeng, Liu, Bojun, Yuan, Luying
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Identifying objects in remote sensing images remains a critical challenge. However, remote sensing images typically encompass numerous small objects, significant variations in object sizes, and a dispersed distribution of objects, all of which pose challenges to the performance of existing object detectors. We present PR-Deformable DEtection Transformer (DETR), a novel model for remote sensing object detection to address these challenges. First, we introduce the tridirectional adaptive feature fusion pyramid network (TAFFPN) feature pyramid module to adaptively fuse data from diverse feature map layers, thereby enhancing the model's multiscale representation capability. Second, we propose the Res-Deformable Encoder, which integrates deformable encoders across different input scales via residual connections, generating feature vectors that capture rich semantic information of remote sensing objects. Last, we introduce the dynamic reference point module (DRPM) Decoder, which leverages 4-D reference points enriched with high-level (HL) feature priors to strengthen the model's object localization capabilities. Experimental results demonstrate that PR-Deformable DETR achieves state-of-the-art remote sensing object detection accuracy, achieving 88.3% mean average precision (mAP) on the NWPU VHR-10 dataset and 95.1% mAP on the RSOD dataset, with a corresponding 16% reduction in GFLOPs. These results satisfy the performance standards required for remote sensing object detection tasks.
ISSN:1545-598X
1558-0571
DOI:10.1109/LGRS.2024.3483217