INT8 offline quantization and integer inference method based on Transform model
The invention provides an INT8 offline quantization and integer inference method based on a Transform model. The INT8 offline quantization and integer inference method based on the Transform model comprises the following steps: converting an L2 norm of a normalization layer in an original Transform...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention provides an INT8 offline quantization and integer inference method based on a Transform model. The INT8 offline quantization and integer inference method based on the Transform model comprises the following steps: converting an L2 norm of a normalization layer in an original Transform floating point model into an L1 norm; carrying out model training; performing forward inference through a small amount of data to obtain a quantization coefficient of input data of each layer of matrix operation, and extracting the quantization coefficient as general floating point data; obtaining a weight quantization coefficient of each linear layer in the floating point model, extracting the weight quantization coefficient as general floating point data, and determining an optimal weight quantization coefficient in each layer according to a mean square error calculation method; converting quantization coefficients related to quantization operation in the inference process into 2-n floating-point number forms, an |
---|