Real-Time Weakly Supervised Object Detection Using Center-of-Features Localization

We propose a high-speed convolutional neural network approach for weakly supervised localization (WSL) and weakly supervised object detection (WSOD). The proposed method, called center-of-features localization (COFL), performs localization of objects in a visual scene by combining both multi-label c...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE access 2021, Vol.9, p.38742-38756
Hauptverfasser: Ibrahem, Hatem, Salem, Ahmed Diefy Ahmed, Kang, Hyun-Soo
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We propose a high-speed convolutional neural network approach for weakly supervised localization (WSL) and weakly supervised object detection (WSOD). The proposed method, called center-of-features localization (COFL), performs localization of objects in a visual scene by combining both multi-label classification and regression for the number of instances of each class object. A modified Xception network architecture is used as the main feature extractor, and a classification-plus-regression loss function is used to perform the detection task. The method does not require bounding box annotations but only image labels and counts of the objects of each class in the image. This combination can produce a clear localization of objects in the scene through a masking technique between class activation maps (CAMs) and regression activation maps (RAMs). The proposed method was trained and tested on the PASCAL VOC2007 and VOC2012 datasets; it attained a mean average precision (mAP) of 47.0% and a correct localization CorLoc of 64.1% on PASCAL VOC2007 and a mAP of 42.3% and a CorLoc of 65.5% on PASCAL VOC2012 while performing object detection at a speed of ~50 fps. These results demonstrate that the network can perform object detection accurately in real-time using only image labels and object counts, which are inexpensive to annotate compared with the bounding box annotations typically employed in fully supervised object detection methods. The network far outperforms other weakly supervised methods and some fully supervised methods in terms of processing time while achieving fair accuracy.
ISSN:2169-3536
2169-3536
DOI:10.1109/ACCESS.2021.3064372