TransFGU: A Top-down Approach to Fine-Grained Unsupervised Semantic Segmentation
Unsupervised semantic segmentation aims to obtain high-level semantic representation on low-level visual features without manual annotations. Most existing methods are bottom-up approaches that try to group pixels into regions based on their visual cues or certain predefined rules. As a result, it i...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Unsupervised semantic segmentation aims to obtain high-level semantic
representation on low-level visual features without manual annotations. Most
existing methods are bottom-up approaches that try to group pixels into regions
based on their visual cues or certain predefined rules. As a result, it is
difficult for these bottom-up approaches to generate fine-grained semantic
segmentation when coming to complicated scenes with multiple objects and some
objects sharing similar visual appearance. In contrast, we propose the first
top-down unsupervised semantic segmentation framework for fine-grained
segmentation in extremely complicated scenarios. Specifically, we first obtain
rich high-level structured semantic concept information from large-scale vision
data in a self-supervised learning manner, and use such information as a prior
to discover potential semantic categories presented in target datasets.
Secondly, the discovered high-level semantic categories are mapped to low-level
pixel features by calculating the class activate map (CAM) with respect to
certain discovered semantic representation. Lastly, the obtained CAMs serve as
pseudo labels to train the segmentation module and produce the final semantic
segmentation. Experimental results on multiple semantic segmentation benchmarks
show that our top-down unsupervised segmentation is robust to both
object-centric and scene-centric datasets under different semantic granularity
levels, and outperforms all the current state-of-the-art bottom-up methods. Our
code is available at \url{https://github.com/damo-cv/TransFGU}. |
---|---|
DOI: | 10.48550/arxiv.2112.01515 |