Self-Training-Based Semantic-Balanced Network for Weakly Supervised Object Detection in Remote-Sensing Images
A weakly supervised object detection (WSOD) task is to train a detector with only image-level labels provided. Except for the training difficulty introduced by weaker annotations, the inherent complexity of the remote-sensing images (RSIs) also adds to the challenge. To boost the detector's loc...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on geoscience and remote sensing 2024, Vol.62, p.1-12 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A weakly supervised object detection (WSOD) task is to train a detector with only image-level labels provided. Except for the training difficulty introduced by weaker annotations, the inherent complexity of the remote-sensing images (RSIs) also adds to the challenge. To boost the detector's localization accuracy, we aim to exploit more semantic information contained in images and help improve the general robustness of the model. Noticing previous methods tend to focus on the most discriminative part of an object, we design a self-training-based network that leverages local semantic features. To this end, we develop a semantic-balanced localization module (SBLM) that distinguishes foreground from background and accurate proposals from incomplete ones, by leveraging a balance of region of interest (ROI) and its context information. Moreover, we find that the self-training strategy highly relies on the quality of pseudo-ground-truth boxes. Motivated by this possible lack of robustness, we design a comprehensive clustering module (CCM) and saliency-based proposal filtering (SPF) module that select pseudo-ground truth more comprehensively under supervision. To be more specific, CCM aims to reduce the arbitrariness during assigning pseudo-labels by considering multiple categorical vectors simultaneously. Salient object detection (SOD) is applied in the SPF module to help evaluate the quality of the chosen pseudo-ground-truth boxes. The detection performance is significantly boosted with the proposed method. Extensive experiments conducted on the NWPU VHR-10.v2 dataset and the DIOR dataset validate that the proposed model outperforms the previous state-of-the-art methods favorably with an mAP of 64.9% and 28.1%, respectively. |
---|---|
ISSN: | 0196-2892 1558-0644 |
DOI: | 10.1109/TGRS.2023.3341061 |