COMPUTER SYSTEM FOR DISTRIBUTED MACHINE LEARNING

A computer system for distributed training of a machine learning model comprising a BSP system, at least one machine learning module, and a shared memory module. The BSP system includes a central BSP control module and at least one local BSP module. The central BSP control module is configured to in...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: PETERFREUND, Natan, WU, Zuguang, VERNER, Uri, TALYANSKY, Roman, MELAMED, Zach
Format: Patent
Sprache:eng ; fre ; ger
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A computer system for distributed training of a machine learning model comprising a BSP system, at least one machine learning module, and a shared memory module. The BSP system includes a central BSP control module and at least one local BSP module. The central BSP control module is configured to instruct the at least one local BSP module to store, in its associated shared memory module, a local model. The at least one machine learning module is configured to read, from its associated shared memory module, the local model, compute a gradient based on the local model, and aggregate the gradient immediately after its computation into an aggregated gradient in its associated shared memory module. The central BSP control module is further configured to instruct the at least one local BSP module to periodically read out its associated shared memory module.