From Discriminant to Complete: Reinforcement Searching-Agent Learning for Weakly Supervised Object Detection

Weakly supervised object detection (WSOD) is an interesting yet challenging task in the computer vision community. The core is to discover the image regions that contain the complete object instances under the image-level supervision. Existing works usually solve this problem via a proposal selectio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2020-12, Vol.31 (12), p.5549-5560
Hauptverfasser: Zhang, Dingwen, Han, Junwei, Zhao, Long, Zhao, Tao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Weakly supervised object detection (WSOD) is an interesting yet challenging task in the computer vision community. The core is to discover the image regions that contain the complete object instances under the image-level supervision. Existing works usually solve this problem via a proposal selection strategy, which selects the most discriminative box regions from the weakly labeled training images. However, these regions usually only contain the discriminative object parts rather than the complete object instances. To address this problem, this article proposes to learn a searching-agent to gradually mine desirable object regions under a region searching paradigm, where we formulate the searching process as a Markov decision process and learn the searching-agent under a deep reinforcement learning framework. To learn such a searching-agent under the weak supervision, we extract the pseudo-complete object regions and the corresponding local discriminative object parts and introduce the obtained pseudo-target-part training pairs into the reinforcement learning process of the search-agent. This learning strategy has twofold advantages: 1) it can mimic the searching process to reveal complete object regions from a certain discriminative part of the object under the weak supervision and 2) it will not suffer from the learning difficulty arise from the long-action sequence that happens when searching from the entire image range. Comprehensive experiments on benchmark data sets demonstrate that by integrating the learned searching-agent with the existing WSOD method, we can achieve better performance than the other state-of-the-art and baseline methods.
ISSN:2162-237X
2162-2388
DOI:10.1109/TNNLS.2020.2969483