An Embedding-Dynamic Approach to Self-supervised Learning
A number of recent self-supervised learning methods have shown impressive performance on image classification and other tasks. A somewhat bewildering variety of techniques have been used, not always with a clear understanding of the reasons for their benefits, especially when used in combination. He...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A number of recent self-supervised learning methods have shown impressive
performance on image classification and other tasks. A somewhat bewildering
variety of techniques have been used, not always with a clear understanding of
the reasons for their benefits, especially when used in combination. Here we
treat the embeddings of images as point particles and consider model
optimization as a dynamic process on this system of particles. Our dynamic
model combines an attractive force for similar images, a locally dispersive
force to avoid local collapse, and a global dispersive force to achieve a
globally-homogeneous distribution of particles. The dynamic perspective
highlights the advantage of using a delayed-parameter image embedding (a la
BYOL) together with multiple views of the same image. It also uses a
purely-dynamic local dispersive force (Brownian motion) that shows improved
performance over other methods and does not require knowledge of other particle
coordinates. The method is called MSBReg which stands for (i) a Multiview
centroid loss, which applies an attractive force to pull different image view
embeddings toward their centroid, (ii) a Singular value loss, which pushes the
particle system toward spatially homogeneous density, (iii) a Brownian
diffusive loss. We evaluate downstream classification performance of MSBReg on
ImageNet as well as transfer learning tasks including fine-grained
classification, multi-class object classification, object detection, and
instance segmentation. In addition, we also show that applying our
regularization term to other methods further improves their performance and
stabilize the training by preventing a mode collapse. |
---|---|
DOI: | 10.48550/arxiv.2207.03552 |