Enhanced Pseudo-Label Generation With Self-Supervised Training for Weakly- Supervised Semantic Segmentation

Due to the high cost of pixel-level labels required for fully-supervised semantic segmentation, weakly-supervised segmentation has emerged as a more viable option recently. Existing weakly-supervised methods tried to generate pseudo-labels without pixel-level labels for semantic segmentation, but a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2024-08, Vol.34 (8), p.7017-7028
Hauptverfasser:	Qin, Zhen, Chen, Yujie, Zhu, Guosong, Zhou, Erqiang, Zhou, Yingjie, Zhou, Yicong, Zhu, Ce
Format:	Artikel
Sprache:	eng
Schlagworte:	attention transfer mechanism Cams class attention/activation maps Convolution Feature extraction Semantic segmentation Semantics Task analysis Training weakly-supervised learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Due to the high cost of pixel-level labels required for fully-supervised semantic segmentation, weakly-supervised segmentation has emerged as a more viable option recently. Existing weakly-supervised methods tried to generate pseudo-labels without pixel-level labels for semantic segmentation, but a common problem is that the generated pseudo-labels contain insufficient semantic information, resulting in poor accuracy. To address this challenge, a novel method is proposed, which generates class activation/attention maps (CAMs) containing sufficient semantic information as pseudo-labels for the semantic segmentation training without pixel-level labels. In this method, the attention-transfer module is designed to preserve salient regions on CAMs while avoiding the suppression of inconspicuous regions of the targets, which results in the generation of pseudo-labels with sufficient semantic information. A pixel relevance focused-unfocused module has also been developed for better integrating contextual information, with both attention mechanisms employed to extract focused relevant pixels and multi-scale atrous convolution employed to expand receptive field for establishing distant pixel connections. The proposed method has been experimentally demonstrated to achieve competitive performance in weakly-supervised segmentation, and even outperforms many saliency-joined methods.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2024.3364764