Self-Supervised Visual Representation Learning via Residual Momentum
Self-supervised learning (SSL) approaches have shown promising capabilities in learning the representation from unlabeled data. Amongst them, momentum-based frameworks have attracted significant attention. Despite being a great success, these momentum-based SSL frameworks suffer from a large gap in...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Self-supervised learning (SSL) approaches have shown promising capabilities
in learning the representation from unlabeled data. Amongst them,
momentum-based frameworks have attracted significant attention. Despite being a
great success, these momentum-based SSL frameworks suffer from a large gap in
representation between the online encoder (student) and the momentum encoder
(teacher), which hinders performance on downstream tasks. This paper is the
first to investigate and identify this invisible gap as a bottleneck that has
been overlooked in the existing SSL frameworks, potentially preventing the
models from learning good representation. To solve this problem, we propose
"residual momentum" to directly reduce this gap to encourage the student to
learn the representation as close to that of the teacher as possible, narrow
the performance gap with the teacher, and significantly improve the existing
SSL. Our method is straightforward, easy to implement, and can be easily
plugged into other SSL frameworks. Extensive experimental results on numerous
benchmark datasets and diverse network architectures have demonstrated the
effectiveness of our method over the state-of-the-art contrastive learning
baselines. |
---|---|
DOI: | 10.48550/arxiv.2211.09861 |