Long text large model training method and device
The invention relates to a long text large model training method and device, and belongs to the technical field of model training. The method comprises the steps that an original input matrix corresponding to long text data is acquired; performing feature enhancement and multi-semantic transfer proc...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to a long text large model training method and device, and belongs to the technical field of model training. The method comprises the steps that an original input matrix corresponding to long text data is acquired; performing feature enhancement and multi-semantic transfer processing on the original input matrix to obtain a first matrix; performing feature extraction and data enhancement processing on the first matrix to obtain a second matrix; calculating a query matrix, a key matrix and a value matrix according to the second matrix; calculating a self-attention weight according to the query matrix, the key matrix and the value matrix; shifting the query matrix, the key matrix and the value matrix; calculating a gradient according to the self-attention weight, and updating model parameters by using a gradient descent method according to a loss function of the model; and repeating the steps until the model converges to obtain a long text large model. According to the method, computing re |
---|