Revisiting ResNets: Improved Training and Scaling Strategies

Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Bello, Irwan, Fedus, William, Du, Xianzhi, Cubuk, Ekin D, Srinivas, Aravind, Lin, Tsung-Yi, Shlens, Jonathon, Zoph, Barret
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Computer Vision and Pattern Recognition
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Bello, Irwan Fedus, William Du, Xianzhi Cubuk, Ekin D Srinivas, Aravind Lin, Tsung-Yi Shlens, Jonathon Zoph, Barret
description	Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended (Tan & Le, 2019). Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet. In a large-scale semi-supervised learning setup, ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being 4.7x faster than EfficientNet NoisyStudent. The training techniques improve transfer performance on a suite of downstream tasks (rivaling state-of-the-art self-supervised algorithms) and extend to video classification on Kinetics-400. We recommend practitioners use these simple revised ResNets as baselines for future research.
doi_str_mv	10.48550/arxiv.2103.07579
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2103_07579</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2103_07579</sourcerecordid><originalsourceid>FETCH-LOGICAL-a1159-54fc3649b222335aecfa728869029e532fe1cb7a1d4cc11135c690a43040138a3</originalsourceid><addsrcrecordid>eNotj81qwkAUhWfThVgfwFXzAolz586YTOlGxD-QFtSuw3VyIwOaykwI9u1rtKtz4IPD-YQYg8x0YYycULj5LlMgMZO5ye1AfOy489G3vjklO46f3Mb3ZHO5hp-Oq-QQyDc9oqZK9o7Ofd-3gVo-eY6v4qWmc-TRfw7F93JxmK_T7ddqM59tUwIwNjW6djjV9qiUQjTErqZcFcXUSmXZoKoZ3DEnqLRzAIDG3RFplFoCFoRD8fbcffwvr8FfKPyWvUf58MA_1BdBkw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Revisiting ResNets: Improved Training and Scaling Strategies</title><source>arXiv.org</source><creator>Bello, Irwan ; Fedus, William ; Du, Xianzhi ; Cubuk, Ekin D ; Srinivas, Aravind ; Lin, Tsung-Yi ; Shlens, Jonathon ; Zoph, Barret</creator><creatorcontrib>Bello, Irwan ; Fedus, William ; Du, Xianzhi ; Cubuk, Ekin D ; Srinivas, Aravind ; Lin, Tsung-Yi ; Shlens, Jonathon ; Zoph, Barret</creatorcontrib><description>Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended (Tan & Le, 2019). Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet. In a large-scale semi-supervised learning setup, ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being 4.7x faster than EfficientNet NoisyStudent. The training techniques improve transfer performance on a suite of downstream tasks (rivaling state-of-the-art self-supervised algorithms) and extend to video classification on Kinetics-400. We recommend practitioners use these simple revised ResNets as baselines for future research.</description><identifier>DOI: 10.48550/arxiv.2103.07579</identifier><language>eng</language><subject>Computer Science - Computer Vision and Pattern Recognition</subject><creationdate>2021-03</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-a1159-54fc3649b222335aecfa728869029e532fe1cb7a1d4cc11135c690a43040138a3</citedby></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2103.07579$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2103.07579$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Bello, Irwan</creatorcontrib><creatorcontrib>Fedus, William</creatorcontrib><creatorcontrib>Du, Xianzhi</creatorcontrib><creatorcontrib>Cubuk, Ekin D</creatorcontrib><creatorcontrib>Srinivas, Aravind</creatorcontrib><creatorcontrib>Lin, Tsung-Yi</creatorcontrib><creatorcontrib>Shlens, Jonathon</creatorcontrib><creatorcontrib>Zoph, Barret</creatorcontrib><title>Revisiting ResNets: Improved Training and Scaling Strategies</title><description>Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended (Tan & Le, 2019). Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet. In a large-scale semi-supervised learning setup, ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being 4.7x faster than EfficientNet NoisyStudent. The training techniques improve transfer performance on a suite of downstream tasks (rivaling state-of-the-art self-supervised algorithms) and extend to video classification on Kinetics-400. We recommend practitioners use these simple revised ResNets as baselines for future research.</description><subject>Computer Science - Computer Vision and Pattern Recognition</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81qwkAUhWfThVgfwFXzAolz586YTOlGxD-QFtSuw3VyIwOaykwI9u1rtKtz4IPD-YQYg8x0YYycULj5LlMgMZO5ye1AfOy489G3vjklO46f3Mb3ZHO5hp-Oq-QQyDc9oqZK9o7Ofd-3gVo-eY6v4qWmc-TRfw7F93JxmK_T7ddqM59tUwIwNjW6djjV9qiUQjTErqZcFcXUSmXZoKoZ3DEnqLRzAIDG3RFplFoCFoRD8fbcffwvr8FfKPyWvUf58MA_1BdBkw</recordid><startdate>20210312</startdate><enddate>20210312</enddate><creator>Bello, Irwan</creator><creator>Fedus, William</creator><creator>Du, Xianzhi</creator><creator>Cubuk, Ekin D</creator><creator>Srinivas, Aravind</creator><creator>Lin, Tsung-Yi</creator><creator>Shlens, Jonathon</creator><creator>Zoph, Barret</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210312</creationdate><title>Revisiting ResNets: Improved Training and Scaling Strategies</title><author>Bello, Irwan ; Fedus, William ; Du, Xianzhi ; Cubuk, Ekin D ; Srinivas, Aravind ; Lin, Tsung-Yi ; Shlens, Jonathon ; Zoph, Barret</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a1159-54fc3649b222335aecfa728869029e532fe1cb7a1d4cc11135c690a43040138a3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Computer Vision and Pattern Recognition</topic><toplevel>online_resources</toplevel><creatorcontrib>Bello, Irwan</creatorcontrib><creatorcontrib>Fedus, William</creatorcontrib><creatorcontrib>Du, Xianzhi</creatorcontrib><creatorcontrib>Cubuk, Ekin D</creatorcontrib><creatorcontrib>Srinivas, Aravind</creatorcontrib><creatorcontrib>Lin, Tsung-Yi</creatorcontrib><creatorcontrib>Shlens, Jonathon</creatorcontrib><creatorcontrib>Zoph, Barret</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Bello, Irwan</au><au>Fedus, William</au><au>Du, Xianzhi</au><au>Cubuk, Ekin D</au><au>Srinivas, Aravind</au><au>Lin, Tsung-Yi</au><au>Shlens, Jonathon</au><au>Zoph, Barret</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Revisiting ResNets: Improved Training and Scaling Strategies</atitle><date>2021-03-12</date><risdate>2021</risdate><abstract>Novel computer vision architectures monopolize the spotlight, but the impact of the model architecture is often conflated with simultaneous changes to training methodology and scaling strategies. Our work revisits the canonical ResNet (He et al., 2015) and studies these three aspects in an effort to disentangle them. Perhaps surprisingly, we find that training and scaling strategies may matter more than architectural changes, and further, that the resulting ResNets match recent state-of-the-art models. We show that the best performing scaling strategy depends on the training regime and offer two new scaling strategies: (1) scale model depth in regimes where overfitting can occur (width scaling is preferable otherwise); (2) increase image resolution more slowly than previously recommended (Tan & Le, 2019). Using improved training and scaling strategies, we design a family of ResNet architectures, ResNet-RS, which are 1.7x - 2.7x faster than EfficientNets on TPUs, while achieving similar accuracies on ImageNet. In a large-scale semi-supervised learning setup, ResNet-RS achieves 86.2% top-1 ImageNet accuracy, while being 4.7x faster than EfficientNet NoisyStudent. The training techniques improve transfer performance on a suite of downstream tasks (rivaling state-of-the-art self-supervised algorithms) and extend to video classification on Kinetics-400. We recommend practitioners use these simple revised ResNets as baselines for future research.</abstract><doi>10.48550/arxiv.2103.07579</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2103.07579
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2103_07579
source	arXiv.org
subjects	Computer Science - Computer Vision and Pattern Recognition
title	Revisiting ResNets: Improved Training and Scaling Strategies
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-08T01%3A20%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Revisiting%20ResNets:%20Improved%20Training%20and%20Scaling%20Strategies&rft.au=Bello,%20Irwan&rft.date=2021-03-12&rft_id=info:doi/10.48550/arxiv.2103.07579&rft_dat=%3Carxiv_GOX%3E2103_07579%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true