Self-Supervised Learning via Maximum Entropy Coding
A mainstream type of current self-supervised learning methods pursues a general-purpose representation that can be well transferred to downstream tasks, typically by optimizing on a given pretext task such as instance discrimination. In this work, we argue that existing pretext tasks inevitably intr...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A mainstream type of current self-supervised learning methods pursues a
general-purpose representation that can be well transferred to downstream
tasks, typically by optimizing on a given pretext task such as instance
discrimination. In this work, we argue that existing pretext tasks inevitably
introduce biases into the learned representation, which in turn leads to biased
transfer performance on various downstream tasks. To cope with this issue, we
propose Maximum Entropy Coding (MEC), a more principled objective that
explicitly optimizes on the structure of the representation, so that the
learned representation is less biased and thus generalizes better to unseen
downstream tasks. Inspired by the principle of maximum entropy in information
theory, we hypothesize that a generalizable representation should be the one
that admits the maximum entropy among all plausible representations. To make
the objective end-to-end trainable, we propose to leverage the minimal coding
length in lossy data coding as a computationally tractable surrogate for the
entropy, and further derive a scalable reformulation of the objective that
allows fast computation. Extensive experiments demonstrate that MEC learns a
more generalizable representation than previous methods based on specific
pretext tasks. It achieves state-of-the-art performance consistently on various
downstream tasks, including not only ImageNet linear probe, but also
semi-supervised classification, object detection, instance segmentation, and
object tracking. Interestingly, we show that existing batch-wise and
feature-wise self-supervised objectives could be seen equivalent to low-order
approximations of MEC. Code and pre-trained models are available at
https://github.com/xinliu20/MEC. |
---|---|
DOI: | 10.48550/arxiv.2210.11464 |