Mongolian speech emotion recognition method based on Whisper pre-training model

The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emoti...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	YUAN SHUAI, REN-QING DAOERJI, OUNIER, JI YATU, LI LEIXIAO, SHI BAO
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	ACOUSTICS MUSICAL INSTRUMENTS PHYSICS SPEECH ANALYSIS OR SYNTHESIS SPEECH OR AUDIO CODING OR DECODING SPEECH OR VOICE PROCESSING SPEECH RECOGNITION
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emotional speech; the Whisper pre-training model is input into the Whisper pre-training model, then intermediate features, obtained from a Whisper model encoder part, of encoders of all layers are processed, and the multi-head attention module is adapted to the input dimension of the multi-head attention module through two continuous non-linear full-connection layers; the processed spectrum features and rhythm features are input into a multi-head attention module, key value pairs in an attention mechanism are calculated through the spectrum features, and query vectors are calculated through the rhythm features; after the output of the attention module is obtained, the mean value and the variance of the output are calcul