EFFICIENT QUANTIZATION FOR NEURAL NETWORK DEPLOYMENT AND EXECUTION

Implementations disclosed describe methods and systems to perform the methods of deploying and executing machine learning models on target-specific computational platforms. Optimization techniques include but are not limited to alignment of kernel operations with hardware instructions of a target pr...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ramanna, Vikram Kumar, Pandey, Ashutosh, Li, Kaiping
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Implementations disclosed describe methods and systems to perform the methods of deploying and executing machine learning models on target-specific computational platforms. Optimization techniques include but are not limited to alignment of kernel operations with hardware instructions of a target processing device, reduction of kernel dimensions near boundaries of data, efficient reuse of a small number of memory components during neural network operations, run-time quantization of data and neural network parameters, and other methods.