Study of feature combination using HMM and SVM for multilingual Odiya speech emotion recognition
Emotions are broad aspects and expressed in a similar way by every human being; however, these are affected by culture. This creates a major threat to the universality of speech emotion detection system. Cultural behaviour of society affects the way emotions are expressed and perceived. Hence, an em...
Gespeichert in:
Veröffentlicht in: | International journal of speech technology 2015-09, Vol.18 (3), p.387-393 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Emotions are broad aspects and expressed in a similar way by every human being; however, these are affected by culture. This creates a major threat to the universality of speech emotion detection system. Cultural behaviour of society affects the way emotions are expressed and perceived. Hence, an emotion recognition system customized for languages within a cultural group is feasible. In this work, a speaker dependent and speaker independent emotion recognition system has been proposed for two different dialects of Odisha: Sambalpuri and Cuttacki. Spectral speech features, such as, log power, Mel-frequency cepstral coefficients (MFCC), Delta MFCC, Double delta MFCC, log frequency power coefficients, and linear predictive cepstral coefficients, are used with Hidden Markov model and support vector machines (SVM) classifier, for classifying a speech into one of the seven discrete emotion classes: anger, happiness, disgust, fear, sadness, surprise, and neutral. For a better comparative study of system’s accuracy, features are taken individually as well as in combinations by varying sampling frequency, frame length and frame overlapping. Best average recognition accuracy obtained for speaker independent system, is 82.14 % for SVM classifier using only MFCC as feature vector. However, for speaker dependent system a hike in accuracy of more than 10 % is seen. It is also revealed that use of MFCC on SVM classifier, not only gives the best overall performance on 8 kHz sampling frequency, but also shows consistent performance for all the emotion classes, compared to other classifiers and feature combinations with less computational complexity. Hence, it can be applied efficiently in call centre application for emotion recognition over telephone. |
---|---|
ISSN: | 1381-2416 1572-8110 |
DOI: | 10.1007/s10772-015-9275-7 |