ImageNet-RIB Benchmark: Large Pre-Training Datasets Don't Guarantee Robustness after Fine-Tuning
Highly performant large-scale pre-trained models promise to also provide a valuable foundation for learning specialized tasks, by fine-tuning the model to the desired task. By starting from a good general-purpose model, the goal is to achieve both specialization in the target task and maintain robus...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Highly performant large-scale pre-trained models promise to also provide a
valuable foundation for learning specialized tasks, by fine-tuning the model to
the desired task. By starting from a good general-purpose model, the goal is to
achieve both specialization in the target task and maintain robustness. To
assess the robustness of models to out-of-distribution samples after
fine-tuning on downstream datasets, we introduce a new robust fine-tuning
benchmark, ImageNet-RIB (Robustness Inheritance Benchmark). The benchmark
consists of a set of related but distinct specialized (downstream) tasks;
pre-trained models are fine-tuned on one task in the set and their robustness
is assessed on the rest, iterating across all tasks for fine-tuning and
assessment. We find that the continual learning methods, EWC and LwF maintain
robustness after fine-tuning though fine-tuning generally does reduce
performance on generalization to related downstream tasks across models. Not
surprisingly, models pre-trained on large and rich datasets exhibit higher
initial robustness across datasets and suffer more pronounced degradation
during fine-tuning. The distance between the pre-training and downstream
datasets, measured by optimal transport, predicts this performance degradation
on the pre-training dataset. However, counterintuitively, model robustness
after fine-tuning on related downstream tasks is the worst when the
pre-training dataset is the richest and the most diverse. This suggests that
starting with the strongest foundation model is not necessarily the best
approach for performance on specialist tasks. The benchmark thus offers key
insights for developing more resilient fine-tuning strategies and building
robust machine learning models. https://jd730.github.io/projects/ImageNet-RIB |
---|---|
DOI: | 10.48550/arxiv.2410.21582 |