METHOD AND SYSTEM TO PROCESS ASYNCHRONOUS AND DISTRIBUTED TRAINING TASKS

This disclosure relates generally relates to method and system to process asynchronous and distributed training tasks. Training a large-scale deep neural network (DNN) model with large-scale training data is time-consuming. The method creates a work queue (Q) with a set of predefined number of tasks...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: JAIN, Anubhav, SUBBIAH, Ravindran, KALELE, Amit
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This disclosure relates generally relates to method and system to process asynchronous and distributed training tasks. Training a large-scale deep neural network (DNN) model with large-scale training data is time-consuming. The method creates a work queue (Q) with a set of predefined number of tasks comprising a training data. Here, set of central processing units (CPUs) information and a set of graphics processing units (GPUs) information are fetched from the current environment to initiate a parallel process asynchronously on the work queue (Q) to train a set of deep learning models with optimized resources using a data pre-processing technique, to compute a transformed training data and training by using an asynchronous model training technique, the set of deep learning models on each GPU asynchronously with the transformed training data based on a set of asynchronous model parameters.