Epoch Estimation from Emotional Speech Signals Using Variational Mode Decomposition

This paper presents a novel approach for the estimation of epochs from the emotional speech signal. Epochs are the locations of significant excitation in the vocal tract during the production of voiced sound by the vibration of vocal folds. The estimation of epoch locations is essential for deriving...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Circuits, systems, and signal processing systems, and signal processing, 2018-08, Vol.37 (8), p.3245-3274
Hauptverfasser: Lal, G. Jyothish, Gopalakrishnan, E. A., Govind, D.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper presents a novel approach for the estimation of epochs from the emotional speech signal. Epochs are the locations of significant excitation in the vocal tract during the production of voiced sound by the vibration of vocal folds. The estimation of epoch locations is essential for deriving instantaneous pitch contours for accurate emotion analysis. Many well-known algorithms for epoch extraction are found to show degraded performance due to the varying nature of excitation characteristics in the emotional speech signal. The proposed approach exploits the effectiveness of a new adaptive time series decomposition technique called variational mode decomposition (VMD) for the estimation of epochs. The VMD algorithm is applied on the emotional speech signal for decomposition of the signal into various sub-signals. Analysis of these signals shows that the VMD algorithm captures the center frequency close to the fundamental frequency defined for each glottal cycle of emotional speech utterance through its modes. This center frequency characteristic of the corresponding mode signal helps in the accurate estimation of epoch locations from the emotional speech signal. The performance evaluation of the proposed method is carried out on six different emotions taken from the German emotional speech database with simultaneous electroglottographic signals. Experimental results on clean emotive speech signals show that the proposed method provides identification rate and accuracy comparable to that of the best performing algorithm. Besides, the proposed method provides better reliability in epoch estimation from emotive speech signals degraded by the presence of noise.
ISSN:0278-081X
1531-5878
DOI:10.1007/s00034-018-0804-x