Distributed Machine Learning System
A distributed machine learning system and method are disclosed. According to some implementations of this disclosure, the method includes identifying one or more available computing resources and receiving a task object that indicates a training job to perform. The method includes retrieving a conta...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A distributed machine learning system and method are disclosed. According to some implementations of this disclosure, the method includes identifying one or more available computing resources and receiving a task object that indicates a training job to perform. The method includes retrieving a container image based on the type of model architecture. The container image includes the model architecture and a filesystem. The method includes retrieving and mounting a base model to the filesystem of the container image. The method further includes retrieving and mounting a volume of training data to the filesystem of the container image to obtain a training container. In some implementations, the method further includes executing the training container on at least one of the one or more available computing resources and receiving a trained model from the container after the container completes the training job. The method further includes storing the trained model. |
---|