Continual Adaptation of Vision Transformers for Federated Learning
In this paper, we focus on the important yet understudied problem of Continual Federated Learning (CFL), where a server communicates with a set of clients to incrementally learn new concepts over time without sharing or storing any data. The complexity of this problem is compounded by challenges fro...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we focus on the important yet understudied problem of
Continual Federated Learning (CFL), where a server communicates with a set of
clients to incrementally learn new concepts over time without sharing or
storing any data. The complexity of this problem is compounded by challenges
from both the Continual and Federated Learning perspectives. Specifically,
models trained in a CFL setup suffer from catastrophic forgetting which is
exacerbated by data heterogeneity across clients. Existing attempts at this
problem tend to impose large overheads on clients and communication channels or
require access to stored data which renders them unsuitable for real-world use
due to privacy. In this paper, we attempt to tackle forgetting and
heterogeneity while minimizing overhead costs and without requiring access to
any stored data. We study this problem in the context of Vision Transformers
and explore parameter-efficient approaches to adapt to dynamic distributions
while minimizing forgetting. We achieve this by leveraging a prompting based
approach (such that only prompts and classifier heads have to be communicated)
and proposing a novel and lightweight generation and distillation scheme to
consolidate client models at the server. We formulate this problem for image
classification and establish strong baselines for comparison, conduct
experiments on CIFAR-100 as well as challenging, large-scale datasets like
ImageNet-R and DomainNet. Our approach outperforms both existing methods and
our own baselines by as much as 7% while significantly reducing communication
and client-level computation costs. Code available at
https://github.com/shaunak27/hepco-fed. |
---|---|
DOI: | 10.48550/arxiv.2306.09970 |