Webly-supervised semantic segmentation via curriculum learning

In this paper, we propose a weakly supervised semantic segmentation method by directly learning from web images, which are crawled from the Internet by using text queries, without any explicit user annotation or even data filtering. With the goal of handling the massive amount of noisy labels in web...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computer vision and image understanding 2023-11, Vol.236, p.103810, Article 103810
Hauptverfasser: Huang, Zuxian, Wu, Gangshan, Wang, Limin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, we propose a weakly supervised semantic segmentation method by directly learning from web images, which are crawled from the Internet by using text queries, without any explicit user annotation or even data filtering. With the goal of handling the massive amount of noisy labels in web images, we design a three-stage approach for weakly-supervised semantic segmentation based on curriculum learning. We first generate pixel-level masks for the training images via a popular weakly-supervised semantic segmentation framework. Then, we consider the noise of the web data in two ways. At the image-level, the complexity of data is measured using its distribution density in a classification feature space. At the pixel-level, the complexity of the mask is evaluated by exploiting the relationship between the saliency map and those segmented images in an unsupervised manner. The key insight to this design is that, common and simple object patterns in images should be salient with both the saliency detector and weakly supervised DCNNs, where they should be sparse with high regional consistency between them. This allows for an efficient implementation of curriculum learning from noisy web images. Experiments on the popular PASCAL VOC 2012 benchmark show that we achieve very competitive performance with scores of 64.0% mIoU using our pure web dataset, which contains noisy, single-label images. We further improve the performance to 69.7% mIoU by using the CurriculumWebSegNet fine-tuned on the PASCAL VOC dataset, which has more precise multi-label supervision. •We propose a weakly supervised semantic segmentation method by directly learning from web images without relying on any explicit user annotations or even data pre-selection.•We propose a novel data complexity measurement for the webly-supervised semantic segmentation task, using an easy-to-hard curriculum learning strategy.•The proposed CurriculumWebSeg outperforms other pure web segmentors on the PASCAL VOC 2012 benchmark.
ISSN:1077-3142
1090-235X
DOI:10.1016/j.cviu.2023.103810