TASK PROCESSING METHOD AND APPARATUS BASED ON MODEL QUANTIZATION, AND DEVICE AND STORAGE MEDIUM
Provided in the present disclosure are a task processing method and apparatus based on model quantization, and a device and a storage medium. The task processing method comprises: according to a first difference between a first quantization output of an optimization unit in a transformer model and a...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng ; fre |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Provided in the present disclosure are a task processing method and apparatus based on model quantization, and a device and a storage medium. The task processing method comprises: according to a first difference between a first quantization output of an optimization unit in a transformer model and a first floating-point output of same, updating a weight quantization coefficient of the optimization unit and an activation quantization coefficient of same; according to a second difference between a second quantization output of the optimization unit and a second floating-point output of same, updating a weight quantization increment of the optimization unit; determining a weight quantization rounding direction for the optimization unit according to a target weight quantization increment, and performing quantization on a weight parameter of the optimization unit according to a target weight quantization coefficient and the weight quantization rounding direction; and performing forward reasoning calculation on inp |
---|