METHODS AND APPARATUS FOR MACHINE LEARNING MODEL OPTIMIZATION
This application relates to apparatus and methods for training machine learning models. In some examples, a pool of worker pods are generated that can execute tasks to train a machine learning model. The pool of work pods are assigned tasks by a master that communicates with the worker pods using a...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This application relates to apparatus and methods for training machine learning models. In some examples, a pool of worker pods are generated that can execute tasks to train a machine learning model. The pool of work pods are assigned tasks by a master that communicates with the worker pods using a work queue. Each worker pod can provide output using a results queue. The embodiments may operate with less reliable memory, such as object stores, which may be less costly than other types of storage mechanisms. To operate in less reliable environments, each worker pod can include a checkpoint mechanism that can recover from interruptions, such as interruptions due to node failure or preemption. For example, the checkpoint mechanism may allow a worker pod to continue processing a task, when the task is interrupted, from a last checkpoint. Processing results are provided to a results queue when a task completes. |
---|