Large language model mixing precision quantification method and device, electronic equipment and medium
The invention relates to the technical field of model quantification, in particular to a large language model mixing precision quantification method and device, electronic equipment and a medium, and the method comprises the steps: obtaining the weight of each layer in a plurality of layers of a cur...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to the technical field of model quantification, in particular to a large language model mixing precision quantification method and device, electronic equipment and a medium, and the method comprises the steps: obtaining the weight of each layer in a plurality of layers of a current large language model; based on a preset loss function, determining a quantization bit width allocated to the weight of each layer according to the sensitivity of the weight of each layer to the quantization error; in response to the judgment that the quantization bit width of the weight of the current layer is smaller than the preset threshold value, the weight of the current layer is divided into normal data and outlier data, the normal data in the weight is quantized based on the quantization bit width distributed to the current layer, and the outlier data does not participate in quantization. Therefore, the problems that the numerical value of important outlier data is changed and hardware resources are was |
---|