Method, System, and Computer Program Product for Dynamically Scheduling Machine Learning Inference Jobs with Different Quality of Services on a Shared Infrastructure

A method, system, and computer program product for dynamically scheduling machine learning inference jobs receive or determine a plurality of performance profiles associated with a plurality of system resources, wherein each performance profile is associated with a machine learning model; receive a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Roy, Subir, Karpenko, Igor, Walker, Peter, Lu, Ranglin, Cheng, Yinhe, Gu, Yu
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:A method, system, and computer program product for dynamically scheduling machine learning inference jobs receive or determine a plurality of performance profiles associated with a plurality of system resources, wherein each performance profile is associated with a machine learning model; receive a request for system resources for an inference job associated with the machine learning model; determine a system resource of the plurality of system resources for processing the inference job associated with the machine learning model based on the plurality of performance profiles and a quality of service requirement associated with the inference job; assign the system resource to the inference job for processing the inference job; receive result data associated with processing of the inference job with the system resource; and update based on the result data, a performance profile of the plurality of the performance profiles associated with the system resource and the machine learning model.