Large model cache-based speculation reasoning acceleration method

The invention discloses a speculation reasoning acceleration method based on a large model cache. The method specifically comprises the following steps: S1, model architecture design: firstly, designing a basic architecture of a small language model; s2, data preparation: collecting and preprocessin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: LI TONG, BIAN ZHENGDA, CUI ZIYUAN, DU CUNXIAO, XU YUANCHEN, MAI SIQI, LIU HONGXIN, LI YONGBIN, LEE, SEUNG-GYE
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a speculation reasoning acceleration method based on a large model cache. The method specifically comprises the following steps: S1, model architecture design: firstly, designing a basic architecture of a small language model; s2, data preparation: collecting and preprocessing data for training a model; s3, training process: training the model by using a deep learning framework; s4, integrating a large language model: integrating the trained small language model with a pre-trained large language model, so that speculative decoding can be carried out by utilizing KV cache of the large language model; the invention relates to the technical field of speculation reasoning. According to the large model cache-based speculative reasoning acceleration method, the practicability of a small model is improved, a more efficient and more accurate solution is provided for various language processing tasks, acceleration within 1.5-2.0 times of large model reasoning is provided, the running time is gr