INT8 offline quantization and integer inference method based on Transform model

The invention provides an INT8 offline quantization and integer inference method based on a Transform model. The INT8 offline quantization and integer inference method based on the Transform model comprises the following steps: converting an L2 norm of a normalization layer in an original Transform...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: FANG ZHONGHONG, HE KUN, DENG HANKE, JIANG XIAOBO
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention provides an INT8 offline quantization and integer inference method based on a Transform model. The INT8 offline quantization and integer inference method based on the Transform model comprises the following steps: converting an L2 norm of a normalization layer in an original Transform floating point model into an L1 norm; carrying out model training; performing forward inference through a small amount of data to obtain a quantization coefficient of input data of each layer of matrix operation, and extracting the quantization coefficient as general floating point data; obtaining a weight quantization coefficient of each linear layer in the floating point model, extracting the weight quantization coefficient as general floating point data, and determining an optimal weight quantization coefficient in each layer according to a mean square error calculation method; converting quantization coefficients related to quantization operation in the inference process into 2-n floating-point number forms, an