DaWin: Training-free Dynamic Weight Interpolation for Robust Adaptation
Adapting a pre-trained foundation model on downstream tasks should ensure robustness against distribution shifts without the need to retrain the whole model. Although existing weight interpolation methods are simple yet effective, we argue their static nature limits downstream performance while achi...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Adapting a pre-trained foundation model on downstream tasks should ensure
robustness against distribution shifts without the need to retrain the whole
model. Although existing weight interpolation methods are simple yet effective,
we argue their static nature limits downstream performance while achieving
efficiency. In this work, we propose DaWin, a training-free dynamic weight
interpolation method that leverages the entropy of individual models over each
unlabeled test sample to assess model expertise, and compute per-sample
interpolation coefficients dynamically. Unlike previous works that typically
rely on additional training to learn such coefficients, our approach requires
no training. Then, we propose a mixture modeling approach that greatly reduces
inference overhead raised by dynamic interpolation. We validate DaWin on the
large-scale visual recognition benchmarks, spanning 14 tasks across robust
fine-tuning -- ImageNet and derived five distribution shift benchmarks -- and
multi-task learning with eight classification tasks. Results demonstrate that
DaWin achieves significant performance gain in considered settings, with
minimal computational overhead. We further discuss DaWin's analytic behavior to
explain its empirical success. |
---|---|
DOI: | 10.48550/arxiv.2410.03782 |