Pre-training method of multi-modal universal model, speech recognition method and related device

The invention provides a pre-training method of a multi-modal universal model, a speech recognition method and a related device, which can train the multi-modal universal model based on data of different modalities, improve the universality of the multi-modal universal model for downstream tasks wit...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: LU HAIJUN, ZHU JIAQUAN, CHENG LEI, YANG YANG, CAI XUPU
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention provides a pre-training method of a multi-modal universal model, a speech recognition method and a related device, which can train the multi-modal universal model based on data of different modalities, improve the universality of the multi-modal universal model for downstream tasks with multi-modal input, and improve the speech recognition accuracy of the multi-modal universal model. The parameters of the multi-modal universal model are adjusted by taking the distance of the data features corresponding to the data in the homologous data set as the target, so that the multi-modal universal model can perform the same understanding on the data which have different modals but describe the same or similar content; therefore, the accuracy of the prediction result of the downstream task with the multi-modal input is improved, and the solution capability of the multi-modal universal model for the downstream task with the multi-modal input is improved. 本申请提出一种多模态通用模型的预训练方法、语音识别方法及相关装置,能够基于不同模态的数据对多模态通用模型