Training giant neural networks using pipeline parallelism

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training giant neural networks. One of the methods includes obtaining data specifying a partitioning of the neural network into N composite layers that form a sequence of composite layers, wherein eac...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ngiam, Jiquan, Huang, Yanping, Cheng, Youlong, Lee, HyoukJoong, Chen, Dehao, Chen, Zhifeng
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training giant neural networks. One of the methods includes obtaining data specifying a partitioning of the neural network into N composite layers that form a sequence of composite layers, wherein each composite layer comprises a distinct plurality of layers from the multiple network layers of the neural network; obtaining data assigning each of the N composite layers to one or more computing devices from a set of N computing devices; partitioning a mini-batch of training examples into a plurality of micro-batches; and training the neural network, comprising: performing a forward pass through the neural network until output activations have been computed for each micro-batch for a final composite layer in the sequence, and performing a backward pass through the neural network until output gradients have been computed for each micro-batch for the first composite layer in the sequence.