Pre-training method of multi-modal universal model, speech recognition method and related device

The invention provides a pre-training method of a multi-modal universal model, a speech recognition method and a related device, which can train the multi-modal universal model based on data of different modalities, improve the universality of the multi-modal universal model for downstream tasks wit...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	LU HAIJUN, ZHU JIAQUAN, CHENG LEI, YANG YANG, CAI XUPU
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention provides a pre-training method of a multi-modal universal model, a speech recognition method and a related device, which can train the multi-modal universal model based on data of different modalities, improve the universality of the multi-modal universal model for downstream tasks with multi-modal input, and improve the speech recognition accuracy of the multi-modal universal model. The parameters of the multi-modal universal model are adjusted by taking the distance of the data features corresponding to the data in the homologous data set as the target, so that the multi-modal universal model can perform the same understanding on the data which have different modals but describe the same or similar content; therefore, the accuracy of the prediction result of the downstream task with the multi-modal input is improved, and the solution capability of the multi-modal universal model for the downstream task with the multi-modal input is improved. 本申请提出一种多模态通用模型的预训练方法、语音识别方法及相关装置，能够基于不同模态的数据对多模态通用模型