LPCL: Localized Prominence Contrastive Learning for Self-Supervised Dense Visual Pre-training

•We introduce a novel self-supervised pre-training model, named LPCL, which can use a localized prominence heuristic to perform effective learning on non-iconic multi-instance datasets for dense prediction tasks.•We propose a novel and efficient online objectness patch selection module to guide cont...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2023-03, Vol.135, p.109185, Article 109185
Hauptverfasser: Chen, Zihan, Zhu, Hongyuan, Cheng, Hao, Mi, Siya, Zhang, Yu, Geng, Xin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•We introduce a novel self-supervised pre-training model, named LPCL, which can use a localized prominence heuristic to perform effective learning on non-iconic multi-instance datasets for dense prediction tasks.•We propose a novel and efficient online objectness patch selection module to guide contrastive learning to focus on the localized patches that are likely to be objects.•Benefiting from the multi-level contrastive learning that comprehensively takes the global, local features and local localization information into account, LPCL effectively improves the performance on dense prediction tasks Self-supervised pre-training has attracted increasing attention given its promising performance in training backbone networks without using labels. By far, most methods focus on image classification with datasets containing iconic objects and simple background, e.g. ImageNet. However, these methods show sub-optimal performance for dense prediction tasks (e.g. object detection and scene parsing) when directly pre-training on datasets (e.g. PASCAL VOC and COCO) with multiple objects and cluttered backgrounds. Researchers explored self-supervised dense pre-training methods by adapting recent image pre-training methods. Nevertheless, they require a large number of negative samples and a long training time to reach reasonable performance. In this paper, we propose LPCL, a novel self-supervised representation learning method for dense predictions to settle these issues. To guide the instance information in multi-instance datasets, we define an online object patch selection module to select the local patches with the high possibility of containing instance area in the augmented views efficiently during learning. After obtaining the patches, we present a novel multi-level contrastive learning method considering the instance representation of global-level, local-level and position-level without using negative samples. We conduct extensive experiments with LPCL directly pre-trained on PASCAL VOC and COCO. For PASCAL VOC image classification task, our model achieves state-of-the-art 86.2% accuracy pre-trained on COCO(+9.7% top-1 accuracy compared with baseline BYOL). On object detection, instance segmentation and semantic segmentation task, our proposed model also achieved competitive results compared with other state-of-the-art methods.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2022.109185