DEEP NEURAL NETWORKS (DNN) INFERENCE USING PRACTICAL EARLY EXIT NETWORKS

The present disclosure relates to methods and systems for providing inferences using machine learning systems. The methods and systems receive a load forecast for processing requests by a machine learning model and split the machine learning model into a plurality machine learning model portions bas...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: GANDHI, Swapnil Sunilkumar, PADMANABHA IYER, Anand
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The present disclosure relates to methods and systems for providing inferences using machine learning systems. The methods and systems receive a load forecast for processing requests by a machine learning model and split the machine learning model into a plurality machine learning model portions based on the load forecast. The methods and systems determine a batch size for the requests for the machine learning model portions. The methods and systems use one or more available resources to execute the plurality of machine learning model portions to process the requests and generate inferences for the requests.