METHOD AND SYSTEM TO PROCESS ASYNCHRONOUS AND DISTRIBUTED TRAINING TASKS

This disclosure relates generally relates to method and system to process asynchronous and distributed training tasks. Training a large-scale deep neural network (DNN) model with large-scale training data is time-consuming. The method creates a work queue (Q) with a set of predefined number of tasks...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	JAIN, Anubhav, SUBBIAH, Ravindran, KALELE, Amit
Format:	Patent
Sprache:	eng ; fre ; ger
Schlagworte:	CALCULATING COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This disclosure relates generally relates to method and system to process asynchronous and distributed training tasks. Training a large-scale deep neural network (DNN) model with large-scale training data is time-consuming. The method creates a work queue (Q) with a set of predefined number of tasks comprising a training data. Here, set of central processing units (CPUs) information and a set of graphics processing units (GPUs) information are fetched from the current environment to initiate a parallel process asynchronously on the work queue (Q) to train a set of deep learning models with optimized resources using a data pre-processing technique, to compute a transformed training data and training by using an asynchronous model training technique, the set of deep learning models on each GPU asynchronously with the transformed training data based on a set of asynchronous model parameters.