Intelligent compute resource selection for machine learning training jobs

Techniques for intelligent compute resource selection and utilization for machine learning training jobs are described. At least a portion of a machine learning (ML) training job is executed a plurality of times using a plurality of different resource configurations, where each of the plurality of r...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Smola, Alexander Johannes, Stefani, Stefano, Liberty, Edo, Sivasubramanian, Swaminathan, Anjaneyapura Range, Gowda Dayananda, Wiley, Craig, Faulhaber, Jr., Thomas Albert, Karnin, Zohar, Sadoughi, Amir
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Techniques for intelligent compute resource selection and utilization for machine learning training jobs are described. At least a portion of a machine learning (ML) training job is executed a plurality of times using a plurality of different resource configurations, where each of the plurality of resource configurations includes at least a different type or amount of compute instances. A performance metric is measured for each of the plurality of the executions, and can be used along with a desired performance characteristic to generate a recommended resource configuration for the ML training job. The ML training job is executed using the recommended resource configuration.