Distributed Machine Learning System

A distributed machine learning system and method are disclosed. According to some implementations of this disclosure, the method includes identifying one or more available computing resources and receiving a task object that indicates a training job to perform. The method includes retrieving a conta...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Srinivasan, Nikhil Vikram, Kern, Alexander Simon
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A distributed machine learning system and method are disclosed. According to some implementations of this disclosure, the method includes identifying one or more available computing resources and receiving a task object that indicates a training job to perform. The method includes retrieving a container image based on the type of model architecture. The container image includes the model architecture and a filesystem. The method includes retrieving and mounting a base model to the filesystem of the container image. The method further includes retrieving and mounting a volume of training data to the filesystem of the container image to obtain a training container. In some implementations, the method further includes executing the training container on at least one of the one or more available computing resources and receiving a trained model from the container after the container completes the training job. The method further includes storing the trained model.