COMPUTER SYSTEM FOR DISTRIBUTED MACHINE LEARNING
A computer system for distributed training of a machine learning model comprising a BSP system, at least one machine learning module, and a shared memory module. The BSP system includes a central BSP control module and at least one local BSP module. The central BSP control module is configured to in...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A computer system for distributed training of a machine learning model comprising a BSP system, at least one machine learning module, and a shared memory module. The BSP system includes a central BSP control module and at least one local BSP module. The central BSP control module is configured to instruct the at least one local BSP module to store, in its associated shared memory module, a local model. The at least one machine learning module is configured to read, from its associated shared memory module, the local model, compute a gradient based on the local model, and aggregate the gradient immediately after its computation into an aggregated gradient in its associated shared memory module. The central BSP control module is further configured to instruct the at least one local BSP module to periodically read out its associated shared memory module. |
---|