Quantization method of large language model, electronic equipment, chip system and storage medium

The invention provides a quantification method of a large language model, electronic equipment, a chip system and a storage medium, and relates to the technical field of model quantification, the method can group weights in the large language model in dimension based on a plurality of grouping sizes...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
1. Verfasser: XU CHENGGUO
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention provides a quantification method of a large language model, electronic equipment, a chip system and a storage medium, and relates to the technical field of model quantification, the method can group weights in the large language model in dimension based on a plurality of grouping sizes, and elements in each group under each grouping size of the weights can be divided into multiple groups according to the elements in each group. Calculating quantized elements respectively corresponding to each element through the maximum value and the minimum value of the elements in the group and the scaling coefficient a of the weight; in this way, the quantized elements of each weight in the large language model are obtained; the same business data is input into the large language model with the quantized weight and the large language model with the unquantized weight, so that the output difference of the two models is converged, and the large language model corresponding to the grouping mode with the minimum