EFFICIENT QUANTIZATION FOR NEURAL NETWORK DEPLOYMENT AND EXECUTION
Implementations disclosed describe methods and systems to perform the methods of deploying and executing machine learning models on target-specific computational platforms. Optimization techniques include but are not limited to alignment of kernel operations with hardware instructions of a target pr...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Implementations disclosed describe methods and systems to perform the methods of deploying and executing machine learning models on target-specific computational platforms. Optimization techniques include but are not limited to alignment of kernel operations with hardware instructions of a target processing device, reduction of kernel dimensions near boundaries of data, efficient reuse of a small number of memory components during neural network operations, run-time quantization of data and neural network parameters, and other methods. |
---|