Large model cache-based speculation reasoning acceleration method

The invention discloses a speculation reasoning acceleration method based on a large model cache. The method specifically comprises the following steps: S1, model architecture design: firstly, designing a basic architecture of a small language model; s2, data preparation: collecting and preprocessin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	LI TONG, BIAN ZHENGDA, CUI ZIYUAN, DU CUNXIAO, XU YUANCHEN, MAI SIQI, LIU HONGXIN, LI YONGBIN, LEE, SEUNG-GYE
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention discloses a speculation reasoning acceleration method based on a large model cache. The method specifically comprises the following steps: S1, model architecture design: firstly, designing a basic architecture of a small language model; s2, data preparation: collecting and preprocessing data for training a model; s3, training process: training the model by using a deep learning framework; s4, integrating a large language model: integrating the trained small language model with a pre-trained large language model, so that speculative decoding can be carried out by utilizing KV cache of the large language model; the invention relates to the technical field of speculation reasoning. According to the large model cache-based speculative reasoning acceleration method, the practicability of a small model is improved, a more efficient and more accurate solution is provided for various language processing tasks, acceleration within 1.5-2.0 times of large model reasoning is provided, the running time is gr