GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism

Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2019-07
Hauptverfasser:	Huang, Yanping, Cheng, Youlong, Bapna, Ankur, Firat, Orhan, Mia Xu Chen, Chen, Dehao, Lee, HyoukJoong, Ngiam, Jiquan, Le, Quoc V, Wu, Yonghui, Chen, Zhifeng
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerators Accuracy Image classification Mathematical models Neural networks Parameters Partitions State of the art Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Huang, Yanping Cheng, Youlong Bapna, Ankur Firat, Orhan Mia Xu Chen Chen, Dehao Lee, HyoukJoong Ngiam, Jiquan Le, Quoc V Wu, Yonghui Chen, Zhifeng
description	Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other tasks. To address the need for efficient and task-independent model parallelism, we introduce GPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, GPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently. Moreover, GPipe utilizes a novel batch-splitting pipelining algorithm, resulting in almost linear speedup when a model is partitioned across multiple accelerators. We demonstrate the advantages of GPipe by training large-scale neural networks on two different tasks with distinct network architectures: (i) Image Classification: We train a 557-million-parameter AmoebaNet model and attain a top-1 accuracy of 84.4% on ImageNet-2012, (ii) Multilingual Neural Machine Translation: We train a single 6-billion-parameter, 128-layer Transformer model on a corpus spanning over 100 languages and achieve better quality than all bilingual models.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2135414327</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2135414327</sourcerecordid><originalsourceid>FETCH-proquest_journals_21354143273</originalsourceid><addsrcrecordid>eNqNi0EKwjAURIMgWLR3CLgutD-tFbdS60bponsJksivMan5DV7fFDyAqzfMm1mwBIQosn0JsGIp0ZDnOexqqCqRsEvb4agOvNEa76jsxHsv0aJ9cKd5izI2VxW8NBHTx_kn8UCznn8GreKdjNbETK8NW2ppSKU_rtn21PTHczZ69w6KptvggrdR3aAQVVmUAmrx3-oLRo49aA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2135414327</pqid></control><display><type>article</type><title>GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism</title><source>Free E- Journals</source><creator>Huang, Yanping ; Cheng, Youlong ; Bapna, Ankur ; Firat, Orhan ; Mia Xu Chen ; Chen, Dehao ; Lee, HyoukJoong ; Ngiam, Jiquan ; Le, Quoc V ; Wu, Yonghui ; Chen, Zhifeng</creator><creatorcontrib>Huang, Yanping ; Cheng, Youlong ; Bapna, Ankur ; Firat, Orhan ; Mia Xu Chen ; Chen, Dehao ; Lee, HyoukJoong ; Ngiam, Jiquan ; Le, Quoc V ; Wu, Yonghui ; Chen, Zhifeng</creatorcontrib><description>Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other tasks. To address the need for efficient and task-independent model parallelism, we introduce GPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, GPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently. Moreover, GPipe utilizes a novel batch-splitting pipelining algorithm, resulting in almost linear speedup when a model is partitioned across multiple accelerators. We demonstrate the advantages of GPipe by training large-scale neural networks on two different tasks with distinct network architectures: (i) Image Classification: We train a 557-million-parameter AmoebaNet model and attain a top-1 accuracy of 84.4% on ImageNet-2012, (ii) Multilingual Neural Machine Translation: We train a single 6-billion-parameter, 128-layer Transformer model on a corpus spanning over 100 languages and achieve better quality than all bilingual models.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Accelerators ; Accuracy ; Image classification ; Mathematical models ; Neural networks ; Parameters ; Partitions ; State of the art ; Training</subject><ispartof>arXiv.org, 2019-07</ispartof><rights>2019. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Huang, Yanping</creatorcontrib><creatorcontrib>Cheng, Youlong</creatorcontrib><creatorcontrib>Bapna, Ankur</creatorcontrib><creatorcontrib>Firat, Orhan</creatorcontrib><creatorcontrib>Mia Xu Chen</creatorcontrib><creatorcontrib>Chen, Dehao</creatorcontrib><creatorcontrib>Lee, HyoukJoong</creatorcontrib><creatorcontrib>Ngiam, Jiquan</creatorcontrib><creatorcontrib>Le, Quoc V</creatorcontrib><creatorcontrib>Wu, Yonghui</creatorcontrib><creatorcontrib>Chen, Zhifeng</creatorcontrib><title>GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism</title><title>arXiv.org</title><description>Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other tasks. To address the need for efficient and task-independent model parallelism, we introduce GPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, GPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently. Moreover, GPipe utilizes a novel batch-splitting pipelining algorithm, resulting in almost linear speedup when a model is partitioned across multiple accelerators. We demonstrate the advantages of GPipe by training large-scale neural networks on two different tasks with distinct network architectures: (i) Image Classification: We train a 557-million-parameter AmoebaNet model and attain a top-1 accuracy of 84.4% on ImageNet-2012, (ii) Multilingual Neural Machine Translation: We train a single 6-billion-parameter, 128-layer Transformer model on a corpus spanning over 100 languages and achieve better quality than all bilingual models.</description><subject>Accelerators</subject><subject>Accuracy</subject><subject>Image classification</subject><subject>Mathematical models</subject><subject>Neural networks</subject><subject>Parameters</subject><subject>Partitions</subject><subject>State of the art</subject><subject>Training</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>BENPR</sourceid><recordid>eNqNi0EKwjAURIMgWLR3CLgutD-tFbdS60bponsJksivMan5DV7fFDyAqzfMm1mwBIQosn0JsGIp0ZDnOexqqCqRsEvb4agOvNEa76jsxHsv0aJ9cKd5izI2VxW8NBHTx_kn8UCznn8GreKdjNbETK8NW2ppSKU_rtn21PTHczZ69w6KptvggrdR3aAQVVmUAmrx3-oLRo49aA</recordid><startdate>20190725</startdate><enddate>20190725</enddate><creator>Huang, Yanping</creator><creator>Cheng, Youlong</creator><creator>Bapna, Ankur</creator><creator>Firat, Orhan</creator><creator>Mia Xu Chen</creator><creator>Chen, Dehao</creator><creator>Lee, HyoukJoong</creator><creator>Ngiam, Jiquan</creator><creator>Le, Quoc V</creator><creator>Wu, Yonghui</creator><creator>Chen, Zhifeng</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20190725</creationdate><title>GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism</title><author>Huang, Yanping ; Cheng, Youlong ; Bapna, Ankur ; Firat, Orhan ; Mia Xu Chen ; Chen, Dehao ; Lee, HyoukJoong ; Ngiam, Jiquan ; Le, Quoc V ; Wu, Yonghui ; Chen, Zhifeng</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_21354143273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Accelerators</topic><topic>Accuracy</topic><topic>Image classification</topic><topic>Mathematical models</topic><topic>Neural networks</topic><topic>Parameters</topic><topic>Partitions</topic><topic>State of the art</topic><topic>Training</topic><toplevel>online_resources</toplevel><creatorcontrib>Huang, Yanping</creatorcontrib><creatorcontrib>Cheng, Youlong</creatorcontrib><creatorcontrib>Bapna, Ankur</creatorcontrib><creatorcontrib>Firat, Orhan</creatorcontrib><creatorcontrib>Mia Xu Chen</creatorcontrib><creatorcontrib>Chen, Dehao</creatorcontrib><creatorcontrib>Lee, HyoukJoong</creatorcontrib><creatorcontrib>Ngiam, Jiquan</creatorcontrib><creatorcontrib>Le, Quoc V</creatorcontrib><creatorcontrib>Wu, Yonghui</creatorcontrib><creatorcontrib>Chen, Zhifeng</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Huang, Yanping</au><au>Cheng, Youlong</au><au>Bapna, Ankur</au><au>Firat, Orhan</au><au>Mia Xu Chen</au><au>Chen, Dehao</au><au>Lee, HyoukJoong</au><au>Ngiam, Jiquan</au><au>Le, Quoc V</au><au>Wu, Yonghui</au><au>Chen, Zhifeng</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism</atitle><jtitle>arXiv.org</jtitle><date>2019-07-25</date><risdate>2019</risdate><eissn>2331-8422</eissn><abstract>Scaling up deep neural network capacity has been known as an effective approach to improving model quality for several different machine learning tasks. In many cases, increasing model capacity beyond the memory limit of a single accelerator has required developing special algorithms or infrastructure. These solutions are often architecture-specific and do not transfer to other tasks. To address the need for efficient and task-independent model parallelism, we introduce GPipe, a pipeline parallelism library that allows scaling any network that can be expressed as a sequence of layers. By pipelining different sub-sequences of layers on separate accelerators, GPipe provides the flexibility of scaling a variety of different networks to gigantic sizes efficiently. Moreover, GPipe utilizes a novel batch-splitting pipelining algorithm, resulting in almost linear speedup when a model is partitioned across multiple accelerators. We demonstrate the advantages of GPipe by training large-scale neural networks on two different tasks with distinct network architectures: (i) Image Classification: We train a 557-million-parameter AmoebaNet model and attain a top-1 accuracy of 84.4% on ImageNet-2012, (ii) Multilingual Neural Machine Translation: We train a single 6-billion-parameter, 128-layer Transformer model on a corpus spanning over 100 languages and achieve better quality than all bilingual models.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2019-07
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2135414327
source	Free E- Journals
subjects	Accelerators Accuracy Image classification Mathematical models Neural networks Parameters Partitions State of the art Training
title	GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-07T12%3A23%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=GPipe:%20Efficient%20Training%20of%20Giant%20Neural%20Networks%20using%20Pipeline%20Parallelism&rft.jtitle=arXiv.org&rft.au=Huang,%20Yanping&rft.date=2019-07-25&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2135414327%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2135414327&rft_id=info:pmid/&rfr_iscdi=true