Dual semantic-guided model for weakly-supervised zero-shot semantic segmentation
The major obstacle in semantic segmentation is that it requires a large number of pixel-level labeled data to train an effective model. In order to reduce the cost of annotation, weakly-supervised methods use weaker labels to overcome the need for per-pixel labels, while zero-shot methods transfer t...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2022-02, Vol.81 (4), p.5443-5458 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The major obstacle in semantic segmentation is that it requires a large number of pixel-level labeled data to train an effective model. In order to reduce the cost of annotation, weakly-supervised methods use weaker labels to overcome the need for per-pixel labels, while zero-shot methods transfer the knowledge learned from seen classes to unseen classes to reduce the number of classes that need to be labeled. To further alleviate the burden of annotation, we introduce a more challenging task of Weakly-supervised Zero-shot Semantic Segmentation (WZSS): learning models which only utilize image-level annotation of seen classes to segment images containing unseen objects. To this end, we propose a Dual Semantic-Guided (DSG) model which is double guided by semantic embeddings of classes to obtain classification scores and localization maps. By ignoring the localization maps with low classification scores, our proposed framework can generate prediction segmentation masks. To improve our model’s performance, we propose a simple stochastic selection on semantic embeddings during inference, which explores the difference between image-level class embeddings and pixel-level class embeddings. This simple approach increases our model’s performance in terms of hIoU from 25.9 to 31.8. In addition, compared with some zero-shot semantic segmentation methods, our method delivers better results in terms of hIoU (31.8) and
mIoU
u
(22.0) on the PASCAL VOC 2012 dataset with less supervision information. |
---|---|
ISSN: | 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-021-11792-1 |