Provable Contrastive Continual Learning
Continual learning requires learning incremental tasks with dynamic data distributions. So far, it has been observed that employing a combination of contrastive loss and distillation loss for training in continual learning yields strong performance. To the best of our knowledge, however, this contra...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Continual learning requires learning incremental tasks with dynamic data
distributions. So far, it has been observed that employing a combination of
contrastive loss and distillation loss for training in continual learning
yields strong performance. To the best of our knowledge, however, this
contrastive continual learning framework lacks convincing theoretical
explanations. In this work, we fill this gap by establishing theoretical
performance guarantees, which reveal how the performance of the model is
bounded by training losses of previous tasks in the contrastive continual
learning framework. Our theoretical explanations further support the idea that
pre-training can benefit continual learning. Inspired by our theoretical
analysis of these guarantees, we propose a novel contrastive continual learning
algorithm called CILA, which uses adaptive distillation coefficients for
different tasks. These distillation coefficients are easily computed by the
ratio between average distillation losses and average contrastive losses from
previous tasks. Our method shows great improvement on standard benchmarks and
achieves new state-of-the-art performance. |
---|---|
DOI: | 10.48550/arxiv.2405.18756 |