Coreset Sampling from Open-Set for Fine-Grained Self-Supervised Learning
Deep learning in general domains has constantly been extended to domain-specific tasks requiring the recognition of fine-grained characteristics. However, real-world applications for fine-grained tasks suffer from two challenges: a high reliance on expert knowledge for annotation and necessity of a...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Deep learning in general domains has constantly been extended to
domain-specific tasks requiring the recognition of fine-grained
characteristics. However, real-world applications for fine-grained tasks suffer
from two challenges: a high reliance on expert knowledge for annotation and
necessity of a versatile model for various downstream tasks in a specific
domain (e.g., prediction of categories, bounding boxes, or pixel-wise
annotations). Fortunately, the recent self-supervised learning (SSL) is a
promising approach to pretrain a model without annotations, serving as an
effective initialization for any downstream tasks. Since SSL does not rely on
the presence of annotation, in general, it utilizes the large-scale unlabeled
dataset, referred to as an open-set. In this sense, we introduce a novel
Open-Set Self-Supervised Learning problem under the assumption that a
large-scale unlabeled open-set is available, as well as the fine-grained target
dataset, during a pretraining phase. In our problem setup, it is crucial to
consider the distribution mismatch between the open-set and target dataset.
Hence, we propose SimCore algorithm to sample a coreset, the subset of an
open-set that has a minimum distance to the target dataset in the latent space.
We demonstrate that SimCore significantly improves representation learning
performance through extensive experimental settings, including eleven
fine-grained datasets and seven open-sets in various downstream tasks. |
---|---|
DOI: | 10.48550/arxiv.2303.11101 |