Dual-CBA: Improving Online Continual Learning via Dual Continual Bias Adaptors from a Bi-level Optimization Perspective
In online continual learning (CL), models trained on changing distributions easily forget previously learned knowledge and bias toward newly received tasks. To address this issue, we present Continual Bias Adaptor (CBA), a bi-level framework that augments the classification network to adapt to catas...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In online continual learning (CL), models trained on changing distributions
easily forget previously learned knowledge and bias toward newly received
tasks. To address this issue, we present Continual Bias Adaptor (CBA), a
bi-level framework that augments the classification network to adapt to
catastrophic distribution shifts during training, enabling the network to
achieve a stable consolidation of all seen tasks. However, the CBA module
adjusts distribution shifts in a class-specific manner, exacerbating the
stability gap issue and, to some extent, fails to meet the need for continual
testing in online CL. To mitigate this challenge, we further propose a novel
class-agnostic CBA module that separately aggregates the posterior
probabilities of classes from new and old tasks, and applies a stable
adjustment to the resulting posterior probabilities. We combine the two kinds
of CBA modules into a unified Dual-CBA module, which thus is capable of
adapting to catastrophic distribution shifts and simultaneously meets the
real-time testing requirements of online CL. Besides, we propose Incremental
Batch Normalization (IBN), a tailored BN module to re-estimate its population
statistics for alleviating the feature bias arising from the inner loop
optimization problem of our bi-level framework. To validate the effectiveness
of the proposed method, we theoretically provide some insights into how it
mitigates catastrophic distribution shifts, and empirically demonstrate its
superiority through extensive experiments based on four rehearsal-based
baselines and three public continual learning benchmarks. |
---|---|
DOI: | 10.48550/arxiv.2408.13991 |