DISTRIBUTED TRAINING METHOD AND SYSTEM, DEVICE AND STORAGE MEDIUM

The present application discloses a distributed training method and system, a device and a storage medium, and relates to a technical field of artificial intelligence, and in particular to technical fields of deep learning and cloud computing. The method includes: sending, by a task information serv...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: LIU, Yi, YU, Dianhai, GONG, Weibao, MA, Yanjun, WANG, Haifeng, DONG, Daxiang
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The present application discloses a distributed training method and system, a device and a storage medium, and relates to a technical field of artificial intelligence, and in particular to technical fields of deep learning and cloud computing. The method includes: sending, by a task information server, a first training request and information of an available first computing server to at least a first data server among a plurality of data servers; sending, by the first data server, a first batch of training data to the first computing server, according to the first training request; performing, by the first computing server, model training according to the first batch of training data, sending model parameters to the first data server so as to be stored after the training is completed, and sending identification information of the first batch of training data to the task information server so as to be recorded; wherein the model parameters are not stored at any one of the computing servers. The embodiments of the present application can realize an efficient training procedure with flexible changes in the computing resources.