Enhancing Unsupervised Semantic Segmentation Through Context-Aware Clustering

Despite the great progress of semantic segmentation with supervised learning, annotating large amounts of pixel-wise labels is, however, very expensive and time-consuming. To this end, Unsupervised Semantic Segmentation(USS) has been proposed to learn semantic segmentation, without any form of annot...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2024, Vol.26, p.10081-10093
Hauptverfasser:	Zhuo, Wei, Wang, Yuan, Chen, Junliang, Deng, Songhe, Wang, Zhi, Shen, Linlin, Zhu, Wenwu
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations Computer science context-aware embedding Convolutional neural networks pseudo labeling self-supervised learning semantic clustering Semantic segmentation Semantics Training Unsupervised learning Unsupervised semantic segmentation
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Despite the great progress of semantic segmentation with supervised learning, annotating large amounts of pixel-wise labels is, however, very expensive and time-consuming. To this end, Unsupervised Semantic Segmentation(USS) has been proposed to learn semantic segmentation, without any form of annotations. This approach involves dense prediction of semantics which is however challenging due to the unreliable nature of local representations. To solve this problem, we propose a newly context-aware unsupervised semantic segmentation framework, which aims to enhance the unsupervised semantic segmentation by leveraging contextual knowledge within and across images. In particular, we introduce a training strategy based on our Pyramid Semantic Guidance (PSG), which utilizes holistic semantics on pyramid views to guide pixel clustering with a siamese network-based framework. Additionally, we introduce a Context-Aware Embedding (CAE) module to fuse global features with low-level geometrical and appearance representations. We evaluate our method on the COCO-Stuff dataset and achieved competitive results compared to both the convolutional and ViT-based USS methods. Specifically, we attain significant improvements of +4.5% and +5% mIoU for Stuff and all class segmentation respectively, compared to previous approaches that employ unsupervised convolutional backbones.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2024.3405648