Revisiting ResNets: Improved Training and Scaling Strategies
Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Novel computer vision architectures monopolize the spotlight, but the impact
of the model architecture is often conflated with simultaneous changes to
training methodology and scaling strategies. Our work revisits the canonical
ResNet (He et al., 2015) and studies these three aspects in an effort to
disentangle them. Perhaps surprisingly, we find that training and scaling
strategies may matter more than architectural changes, and further, that the
resulting ResNets match recent state-of-the-art models. We show that the best
performing scaling strategy depends on the training regime and offer two new
scaling strategies: (1) scale model depth in regimes where overfitting can
occur (width scaling is preferable otherwise); (2) increase image resolution
more slowly than previously recommended (Tan & Le, 2019). Using improved
training and scaling strategies, we design a family of ResNet architectures,
ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while
achieving similar accuracies on ImageNet. In a large-scale semi-supervised
learning setup, ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being
4.7x faster than EfficientNet NoisyStudent. The training techniques improve
transfer performance on a suite of downstream tasks (rivaling state-of-the-art
self-supervised algorithms) and extend to video classification on Kinetics-400.
We recommend practitioners use these simple revised ResNets as baselines for
future research. |
---|---|
DOI: | 10.48550/arxiv.2103.07579 |