Enhancing weakly supervised semantic segmentation through multi-class token attention learning

Weakly supervised semantic segmentation using image-level class labels is challenging due to the limitations of Class Activation Maps (CAMs) in convolutional neural networks (CNNs), which often highlight only the most discriminative image regions. We propose the Hierarchical Multi-Class Token Attent...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Journal of supercomputing 2025, Vol.81 (1), Article 74
Hauptverfasser: Luo, Huilan, Zeng, Zhen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Weakly supervised semantic segmentation using image-level class labels is challenging due to the limitations of Class Activation Maps (CAMs) in convolutional neural networks (CNNs), which often highlight only the most discriminative image regions. We propose the Hierarchical Multi-Class Token Attention Network (HMCTANet), a novel approach leveraging a Conformer backbone that integrates CNN and Transformer branches. HMCTANet enhances CAMs through multi-class token attention and a Class-Aware Training (CAT) strategy that aligns class tokens with ground-truth labels. Additionally, we introduce a Class Token Regularization Module (CTRM) to improve the discriminative power of class tokens. Our Refinement Module (RM) further refines segmentation by combining class-specific attention and patch-level affinity from the Transformer branch with the CAMs from the CNN branch. HMCTANet achieves state-of-the-art performance, with mIoU scores of 69.0 and 68.4% on the PASCAL VOC 2012 validation and test sets, respectively, demonstrating the effectiveness of our approach for WSSS tasks.
ISSN:0920-8542
1573-0484
DOI:10.1007/s11227-024-06618-4