Large model reasoning acceleration method and system combining machine learning and speculation sampling

The invention discloses a machine learning and speculation sampling combined large model reasoning acceleration method and system. The method comprises the following steps: constructing an n-gram language model according to retrieved local knowledge; in the n-gram language model reasoning stage, pro...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: XIE SHUGUI, WANG ZIBIN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a machine learning and speculation sampling combined large model reasoning acceleration method and system. The method comprises the following steps: constructing an n-gram language model according to retrieved local knowledge; in the n-gram language model reasoning stage, probability distribution of a next token in a word list is predicted according to a given character token, and the next token is sampled and predicted; based on the constructed n-gram model and the large model, a speculation sampling algorithm is realized, and large model reasoning is accelerated. Compared with a current mainstream method, compared with an approximate small model used by an existing speculation sampling algorithm, the content generated by the method is more reliable, the calculation amount is small, the requirement for memory access is reduced, and the speed is higher. According to the method, an improved speculation sampling algorithm is further applied to a transformers library and a reasoning frame