Weakly Supervised Instance Segmentation by Exploring Entire Object Regions

Weakly supervised instance segmentation with image-level class supervision is a challenging task as it associates the highest-level instances to the lowest-level appearance. Previous approaches for the task utilize classification networks to obtain rough discriminative parts as seed regions and use...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on multimedia 2023, Vol.25, p.352-363
Hauptverfasser:	Zhang, Ke, Yuan, Chun, Zhu, Yiming, Jiang, Yong, Luo, Lishu
Format:	Artikel
Sprache:	eng
Schlagworte:	center detection Clustering Image segmentation Instance segmentation integration module Labels Learning Location awareness Pixels pseudo instance segmentation labels Saliency detection Semantics Streaming media Task analysis Training two-stream network Weakly supervised learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Weakly supervised instance segmentation with image-level class supervision is a challenging task as it associates the highest-level instances to the lowest-level appearance. Previous approaches for the task utilize classification networks to obtain rough discriminative parts as seed regions and use distance as a metric to cluster pixels of the same instances. Unlike previous approaches, we provide a novel self-supervised joint learning framework as the basic network and consider the clustering problem as calculating the probability that pixels belong to each instance. To this end, we propose our self-supervised joint learning two-stream network (SJLT Net) to finish this task. In the first stream, we leverage a joint learning framework to implement image-level supervised semantic segmentation with self-supervised saliency detection. In the second stream, we propose a Center Detection Network to detect different instances' centers with the gaussian loss function to cluster instances pixels. Besides, an integration module is utilized to combine information of both streams and get precise pseudo instances labels. Our approach generates pseudo instance segmentation labels of training images, which are used to train a fully supervised model. Our model achieves excellent performance on the PASCAL VOC 2012 dataset, surpassing the best baseline trained with the same labels by 4.6\% AP^r_{50} on the train set and 2.6\% AP^r_{50} on the validation set.
ISSN:	1520-9210 1941-0077
DOI:	10.1109/TMM.2021.3126430