Enhancing Unsupervised Semantic Segmentation Through Context-Aware Clustering
Despite the great progress of semantic segmentation with supervised learning, annotating large amounts of pixel-wise labels is, however, very expensive and time-consuming. To this end, Unsupervised Semantic Segmentation(USS) has been proposed to learn semantic segmentation, without any form of annot...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on multimedia 2024, Vol.26, p.10081-10093 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Despite the great progress of semantic segmentation with supervised learning, annotating large amounts of pixel-wise labels is, however, very expensive and time-consuming. To this end, Unsupervised Semantic Segmentation(USS) has been proposed to learn semantic segmentation, without any form of annotations. This approach involves dense prediction of semantics which is however challenging due to the unreliable nature of local representations. To solve this problem, we propose a newly context-aware unsupervised semantic segmentation framework, which aims to enhance the unsupervised semantic segmentation by leveraging contextual knowledge within and across images. In particular, we introduce a training strategy based on our Pyramid Semantic Guidance (PSG), which utilizes holistic semantics on pyramid views to guide pixel clustering with a siamese network-based framework. Additionally, we introduce a Context-Aware Embedding (CAE) module to fuse global features with low-level geometrical and appearance representations. We evaluate our method on the COCO-Stuff dataset and achieved competitive results compared to both the convolutional and ViT-based USS methods. Specifically, we attain significant improvements of +4.5% and +5% mIoU for Stuff and all class segmentation respectively, compared to previous approaches that employ unsupervised convolutional backbones. |
---|---|
ISSN: | 1520-9210 1941-0077 |
DOI: | 10.1109/TMM.2024.3405648 |