Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
We present a novel frequency-based Self-Supervised Learning (SSL) approach that significantly enhances its efficacy for pre-training. Prior work in this direction masks out pre-defined frequencies in the input image and employs a reconstruction loss to pre-train the model. While achieving promising...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present a novel frequency-based Self-Supervised Learning (SSL) approach
that significantly enhances its efficacy for pre-training. Prior work in this
direction masks out pre-defined frequencies in the input image and employs a
reconstruction loss to pre-train the model. While achieving promising results,
such an implementation has two fundamental limitations as identified in our
paper. First, using pre-defined frequencies overlooks the variability of image
frequency responses. Second, pre-trained with frequency-filtered images, the
resulting model needs relatively more data to adapt to naturally looking images
during fine-tuning. To address these drawbacks, we propose FOurier transform
compression with seLf-Knowledge distillation (FOLK), integrating two dedicated
ideas. First, inspired by image compression, we adaptively select the
masked-out frequencies based on image frequency responses, creating more
suitable SSL tasks for pre-training. Second, we employ a two-branch framework
empowered by knowledge distillation, enabling the model to take both the
filtered and original images as input, largely reducing the burden of
downstream tasks. Our experimental results demonstrate the effectiveness of
FOLK in achieving competitive performance to many state-of-the-art SSL methods
across various downstream tasks, including image classification, few-shot
learning, and semantic segmentation. |
---|---|
DOI: | 10.48550/arxiv.2409.10362 |