Amortised Invariance Learning for Contrastive Self-Supervision
Contrastive self-supervised learning methods famously produce high quality transferable representations by learning invariances to different data augmentations. Invariances established during pre-training can be interpreted as strong inductive biases. However these may or may not be helpful, dependi...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Contrastive self-supervised learning methods famously produce high quality
transferable representations by learning invariances to different data
augmentations. Invariances established during pre-training can be interpreted
as strong inductive biases. However these may or may not be helpful, depending
on if they match the invariance requirements of downstream tasks or not. This
has led to several attempts to learn task-specific invariances during
pre-training, however, these methods are highly compute intensive and tedious
to train. We introduce the notion of amortised invariance learning for
contrastive self supervision. In the pre-training stage, we parameterize the
feature extractor by differentiable invariance hyper-parameters that control
the invariances encoded by the representation. Then, for any downstream task,
both linear readout and task-specific invariance requirements can be
efficiently and effectively learned by gradient-descent. We evaluate the notion
of amortised invariances for contrastive learning over two different
modalities: vision and audio, on two widely-used contrastive learning methods
in vision: SimCLR and MoCo-v2 with popular architectures like ResNets and
Vision Transformers, and SimCLR with ResNet-18 for audio. We show that our
amortised features provide a reliable way to learn diverse downstream tasks
with different invariance requirements, while using a single feature and
avoiding task-specific pre-training. This provides an exciting perspective that
opens up new horizons in the field of general purpose representation learning. |
---|---|
DOI: | 10.48550/arxiv.2302.12712 |