Method, System, and Computer Program Product for Dynamically Scheduling Machine Learning Inference Jobs with Different Quality of Services on a Shared Infrastructure
A method, system, and computer program product for dynamically scheduling machine learning inference jobs receive or determine a plurality of performance profiles associated with a plurality of system resources, wherein each performance profile is associated with a machine learning model; receive a...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | A method, system, and computer program product for dynamically scheduling machine learning inference jobs receive or determine a plurality of performance profiles associated with a plurality of system resources, wherein each performance profile is associated with a machine learning model; receive a request for system resources for an inference job associated with the machine learning model; determine a system resource of the plurality of system resources for processing the inference job associated with the machine learning model based on the plurality of performance profiles and a quality of service requirement associated with the inference job; assign the system resource to the inference job for processing the inference job; receive result data associated with processing of the inference job with the system resource; and update based on the result data, a performance profile of the plurality of the performance profiles associated with the system resource and the machine learning model. |
---|