Model Stock: All we need is just a few fine-tuned models
This paper introduces an efficient fine-tuning method for large pre-trained models, offering strong in-distribution (ID) and out-of-distribution (OOD) performance. Breaking away from traditional practices that need a multitude of fine-tuned models for averaging, our approach employs significantly fe...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper introduces an efficient fine-tuning method for large pre-trained
models, offering strong in-distribution (ID) and out-of-distribution (OOD)
performance. Breaking away from traditional practices that need a multitude of
fine-tuned models for averaging, our approach employs significantly fewer
models to achieve final weights yet yield superior accuracy. Drawing from key
insights in the weight space of fine-tuned weights, we uncover a strong link
between the performance and proximity to the center of weight space. Based on
this, we introduce a method that approximates a center-close weight using only
two fine-tuned models, applicable during or after training. Our innovative
layer-wise weight averaging technique surpasses state-of-the-art model methods
such as Model Soup, utilizing only two fine-tuned models. This strategy can be
aptly coined Model Stock, highlighting its reliance on selecting a minimal
number of models to draw a more optimized-averaged model. We demonstrate the
efficacy of Model Stock with fine-tuned models based upon pre-trained CLIP
architectures, achieving remarkable performance on both ID and OOD tasks on the
standard benchmarks, all while barely bringing extra computational demands. Our
code and pre-trained models are available at
https://github.com/naver-ai/model-stock. |
---|---|
DOI: | 10.48550/arxiv.2403.19522 |