Large language model mixing precision quantification method and device, electronic equipment and medium

The invention relates to the technical field of model quantification, in particular to a large language model mixing precision quantification method and device, electronic equipment and a medium, and the method comprises the steps: obtaining the weight of each layer in a plurality of layers of a cur...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: WANG LUNING, WANG YU, LIU TENGXUAN, LI SHIYAO, NING XUEFEI
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to the technical field of model quantification, in particular to a large language model mixing precision quantification method and device, electronic equipment and a medium, and the method comprises the steps: obtaining the weight of each layer in a plurality of layers of a current large language model; based on a preset loss function, determining a quantization bit width allocated to the weight of each layer according to the sensitivity of the weight of each layer to the quantization error; in response to the judgment that the quantization bit width of the weight of the current layer is smaller than the preset threshold value, the weight of the current layer is divided into normal data and outlier data, the normal data in the weight is quantized based on the quantization bit width distributed to the current layer, and the outlier data does not participate in quantization. Therefore, the problems that the numerical value of important outlier data is changed and hardware resources are was