Linguistic and Gender Variation in Speech Emotion Recognition using Spectral Features
29th AICS Vol-3105 (2021) 141-152 This work explores the effect of gender and linguistic-based vocal variations on the accuracy of emotive expression classification. Emotive expressions are considered from the perspective of spectral features in speech (Mel-frequency Cepstral Coefficient, Melspectro...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | 29th AICS Vol-3105 (2021) 141-152 This work explores the effect of gender and linguistic-based vocal variations
on the accuracy of emotive expression classification. Emotive expressions are
considered from the perspective of spectral features in speech (Mel-frequency
Cepstral Coefficient, Melspectrogram, Spectral Contrast). Emotions are
considered from the perspective of Basic Emotion Theory. A convolutional neural
network is utilised to classify emotive expressions in emotive audio datasets
in English, German, and Italian. Vocal variations for spectral features
assessed by (i) a comparative analysis identifying suitable spectral features,
(ii) the classification performance for mono, multi and cross-lingual emotive
data and (iii) an empirical evaluation of a machine learning model to assess
the effects of gender and linguistic variation on classification accuracy. The
results showed that spectral features provide a potential avenue for increasing
emotive expression classification. Additionally, the accuracy of emotive
expression classification was high within mono and cross-lingual emotive data,
but poor in multi-lingual data. Similarly, there were differences in
classification accuracy between gender populations. These results demonstrate
the importance of accounting for population differences to enable accurate
speech emotion recognition. |
---|---|
DOI: | 10.48550/arxiv.2112.09596 |