Investigating Self-Supervised Methods for Label-Efficient Learning
Vision transformers combined with self-supervised learning have enabled the development of models which scale across large datasets for several downstream tasks like classification, segmentation and detection. The low-shot learning capability of these models, across several low-shot downstream tasks...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Vision transformers combined with self-supervised learning have enabled the
development of models which scale across large datasets for several downstream
tasks like classification, segmentation and detection. The low-shot learning
capability of these models, across several low-shot downstream tasks, has been
largely under explored. We perform a system level study of different self
supervised pretext tasks, namely contrastive learning, clustering, and masked
image modelling for their low-shot capabilities by comparing the pretrained
models. In addition we also study the effects of collapse avoidance methods,
namely centring, ME-MAX, sinkhorn, on these downstream tasks. Based on our
detailed analysis, we introduce a framework involving both mask image modelling
and clustering as pretext tasks, which performs better across all low-shot
downstream tasks, including multi-class classification, multi-label
classification and semantic segmentation. Furthermore, when testing the model
on full scale datasets, we show performance gains in multi-class
classification, multi-label classification and semantic segmentation. |
---|---|
DOI: | 10.48550/arxiv.2406.17460 |