Quantization method of large language model, electronic equipment, chip system and storage medium
The invention provides a quantification method of a large language model, electronic equipment, a chip system and a storage medium, and relates to the technical field of model quantification, the method can group weights in the large language model in dimension based on a plurality of grouping sizes...
Gespeichert in:
1. Verfasser: | |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention provides a quantification method of a large language model, electronic equipment, a chip system and a storage medium, and relates to the technical field of model quantification, the method can group weights in the large language model in dimension based on a plurality of grouping sizes, and elements in each group under each grouping size of the weights can be divided into multiple groups according to the elements in each group. Calculating quantized elements respectively corresponding to each element through the maximum value and the minimum value of the elements in the group and the scaling coefficient a of the weight; in this way, the quantized elements of each weight in the large language model are obtained; the same business data is input into the large language model with the quantized weight and the large language model with the unquantized weight, so that the output difference of the two models is converged, and the large language model corresponding to the grouping mode with the minimum |
---|