Enhancing weakly supervised semantic segmentation through multi-class token attention learning

Weakly supervised semantic segmentation using image-level class labels is challenging due to the limitations of Class Activation Maps (CAMs) in convolutional neural networks (CNNs), which often highlight only the most discriminative image regions. We propose the Hierarchical Multi-Class Token Attent...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	The Journal of supercomputing 2025, Vol.81 (1), Article 74
Hauptverfasser:	Luo, Huilan, Zeng, Zhen
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Attention Compilers Computer Science Image enhancement Image segmentation Interpreters Labels Modules Processor Architectures Programming Languages Regularization Semantic segmentation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Weakly supervised semantic segmentation using image-level class labels is challenging due to the limitations of Class Activation Maps (CAMs) in convolutional neural networks (CNNs), which often highlight only the most discriminative image regions. We propose the Hierarchical Multi-Class Token Attention Network (HMCTANet), a novel approach leveraging a Conformer backbone that integrates CNN and Transformer branches. HMCTANet enhances CAMs through multi-class token attention and a Class-Aware Training (CAT) strategy that aligns class tokens with ground-truth labels. Additionally, we introduce a Class Token Regularization Module (CTRM) to improve the discriminative power of class tokens. Our Refinement Module (RM) further refines segmentation by combining class-specific attention and patch-level affinity from the Transformer branch with the CAMs from the CNN branch. HMCTANet achieves state-of-the-art performance, with mIoU scores of 69.0 and 68.4% on the PASCAL VOC 2012 validation and test sets, respectively, demonstrating the effectiveness of our approach for WSSS tasks.
ISSN:	0920-8542 1573-0484
DOI:	10.1007/s11227-024-06618-4