Mongolian speech emotion recognition method based on Whisper pre-training model
The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emoti...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Mongolian speech emotion recognition method based on the Whisper pre-training model comprises the steps that Mongolian emotion speech audio data are acquired, and each Mongolian audio corresponds to one Mongolian text; extracting a logarithmic Mel spectrogram and rhythmic features from the emotional speech; the Whisper pre-training model is input into the Whisper pre-training model, then intermediate features, obtained from a Whisper model encoder part, of encoders of all layers are processed, and the multi-head attention module is adapted to the input dimension of the multi-head attention module through two continuous non-linear full-connection layers; the processed spectrum features and rhythm features are input into a multi-head attention module, key value pairs in an attention mechanism are calculated through the spectrum features, and query vectors are calculated through the rhythm features; after the output of the attention module is obtained, the mean value and the variance of the output are calcul |
---|