A Data-Based Perspective on Transfer Learning
It is commonly believed that in transfer learning including more pre-training data translates into better performance. However, recent evidence suggests that removing data from the source dataset can actually help too. In this work, we take a closer look at the role of the source dataset's comp...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | It is commonly believed that in transfer learning including more pre-training
data translates into better performance. However, recent evidence suggests that
removing data from the source dataset can actually help too. In this work, we
take a closer look at the role of the source dataset's composition in transfer
learning and present a framework for probing its impact on downstream
performance. Our framework gives rise to new capabilities such as pinpointing
transfer learning brittleness as well as detecting pathologies such as
data-leakage and the presence of misleading examples in the source dataset. In
particular, we demonstrate that removing detrimental datapoints identified by
our framework improves transfer learning performance from ImageNet on a variety
of target tasks. Code is available at https://github.com/MadryLab/data-transfer |
---|---|
DOI: | 10.48550/arxiv.2207.05739 |