Not All Semantics are Created Equal: Contrastive Self-supervised Learning with Automatic Temperature Individualization
In this paper, we aim to optimize a contrastive loss with individualized temperatures in a principled and systematic manner for self-supervised learning. The common practice of using a global temperature parameter $\tau$ ignores the fact that ``not all semantics are created equal", meaning that...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we aim to optimize a contrastive loss with individualized
temperatures in a principled and systematic manner for self-supervised
learning. The common practice of using a global temperature parameter $\tau$
ignores the fact that ``not all semantics are created equal", meaning that
different anchor data may have different numbers of samples with similar
semantics, especially when data exhibits long-tails. First, we propose a new
robust contrastive loss inspired by distributionally robust optimization (DRO),
providing us an intuition about the effect of $\tau$ and a mechanism for
automatic temperature individualization. Then, we propose an efficient
stochastic algorithm for optimizing the robust contrastive loss with a provable
convergence guarantee without using large mini-batch sizes. Theoretical and
experimental results show that our algorithm automatically learns a suitable
$\tau$ for each sample. Specifically, samples with frequent semantics use large
temperatures to keep local semantic structures, while samples with rare
semantics use small temperatures to induce more separable features. Our method
not only outperforms prior strong baselines (e.g., SimCLR, CLIP) on unimodal
and bimodal datasets with larger improvements on imbalanced data but also is
less sensitive to hyper-parameters. To our best knowledge, this is the first
methodical approach to optimizing a contrastive loss with individualized
temperatures. |
---|---|
DOI: | 10.48550/arxiv.2305.11965 |