Large model cache-based speculation reasoning acceleration method
The invention discloses a speculation reasoning acceleration method based on a large model cache. The method specifically comprises the following steps: S1, model architecture design: firstly, designing a basic architecture of a small language model; s2, data preparation: collecting and preprocessin...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a speculation reasoning acceleration method based on a large model cache. The method specifically comprises the following steps: S1, model architecture design: firstly, designing a basic architecture of a small language model; s2, data preparation: collecting and preprocessing data for training a model; s3, training process: training the model by using a deep learning framework; s4, integrating a large language model: integrating the trained small language model with a pre-trained large language model, so that speculative decoding can be carried out by utilizing KV cache of the large language model; the invention relates to the technical field of speculation reasoning. According to the large model cache-based speculative reasoning acceleration method, the practicability of a small model is improved, a more efficient and more accurate solution is provided for various language processing tasks, acceleration within 1.5-2.0 times of large model reasoning is provided, the running time is gr |
---|