Long text large model training method and device

The invention relates to a long text large model training method and device, and belongs to the technical field of model training. The method comprises the steps that an original input matrix corresponding to long text data is acquired; performing feature enhancement and multi-semantic transfer proc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: NIU YIFAN, CHEN SHUO, ZHANG LINXIN, CAO MENGJIA, ZHAO RUIJING, BAO SIYU
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to a long text large model training method and device, and belongs to the technical field of model training. The method comprises the steps that an original input matrix corresponding to long text data is acquired; performing feature enhancement and multi-semantic transfer processing on the original input matrix to obtain a first matrix; performing feature extraction and data enhancement processing on the first matrix to obtain a second matrix; calculating a query matrix, a key matrix and a value matrix according to the second matrix; calculating a self-attention weight according to the query matrix, the key matrix and the value matrix; shifting the query matrix, the key matrix and the value matrix; calculating a gradient according to the self-attention weight, and updating model parameters by using a gradient descent method according to a loss function of the model; and repeating the steps until the model converges to obtain a long text large model. According to the method, computing re