METHODS, SYSTEMS, ARTICLES OF MANUFACTURE AND APPARATUS TO IMPROVE DISTRIBUTED MACHINE LEARNING EFFICIENCY

Methods, apparatus, systems, and articles of manufacture are disclosed to improve distributed machine learning efficiency. An example apparatus includes train management circuitry to cause a first vector to be sent from a worker node to an in-network-aggregator (INA) after completion of a first proc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Jha, Satish, Chen, Kuilin Clark, Merwaday, Arvind, Alam, S M Iftekharul, Sharma Banjade, Vesh Raj
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Methods, apparatus, systems, and articles of manufacture are disclosed to improve distributed machine learning efficiency. An example apparatus includes train management circuitry to cause a first vector to be sent from a worker node to an in-network-aggregator (INA) after completion of a first processing iteration requested by a parameter server. The example apparatus also includes protocol configuration circuitry to prohibit a second processing iteration when an availability status of the INA is false, and permit the second processing iteration when (a) an acknowledgement (ACK) from the INA corresponding to the first vector is received and (b) the availability status of the INA is true.