Hierarchical complementary learning for weakly supervised object localization

Weakly supervised object localization (WSOL) is a challenging problem that aims to localize objects without ground-truth bounding boxes. A common approach is to train the model that generates a class activation map (CAM) to localize the discriminative features of the object. Unfortunately, the limit...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Signal processing. Image communication 2022-01, Vol.100, p.116520, Article 116520
Hauptverfasser:	Benassou, Sabrina Narimene, Shi, Wuzhen, Jiang, Feng, Benzine, Abdallah
Format:	Artikel
Sprache:	eng
Schlagworte:	Class activation map Complementary map Fusion strategy Learning Localization Mathematical models Parameters Weakly supervised object localization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Weakly supervised object localization (WSOL) is a challenging problem that aims to localize objects without ground-truth bounding boxes. A common approach is to train the model that generates a class activation map (CAM) to localize the discriminative features of the object. Unfortunately, the limitation of this method is that they detect just a part of the object and not the whole object. To solve this problem, previous works have removed some parts of the image (Zhang et al., 2018; Zhang et al., 2018; Singh and Lee, 2017; Choe and Shim, 2019) to force the model to detect the full object extent. However, these methods require one or many hyper-parameters to erase the appropriate pixels on the image, which could involve a loss of information. In this paper, we propose a Hierarchical Complementary Learning Network method (HCLNet) that helps the CNN to perform better on classification and localization. HCLNet uses a complementary CAM to generate multiple maps that detect different parts of the object. Unlike previous works, this method does not need any extra hyper-parameters, as well as does not introduce a big loss of information. In order to fuse these different maps, two different fusion strategies known as the addition strategy and the l1-norm strategy have been used. These strategies allow to detect the whole object while excluding the background. Extensive experiments show that HCLNet obtains better performance than state-of-the-art methods. •Weakly supervised object localization aims to localize objects using image labels.•HCLNet hierarchically generates different class activation maps, and fuses them.•The addition strategy and the l1-norm strategy have been introduced to fuse the CAMs.•Extensive experiments show that HCLNet achieves a new state-of-the-art performance.
ISSN:	0923-5965 1879-2677
DOI:	10.1016/j.image.2021.116520