Enhanced Deep Learning Techniques for Real-Time Speech Emotion Recognition in Multilingual Contexts

Emotion recognition from speech is crucial for advancing human-computer interactions, enabling more natural and empathetic communication. This study proposes a novel Speech Emotion Recognition (SER) framework that integrates Convolutional Neural Networks (CNNs) and transformer-based architectures to...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering, technology & applied science research technology & applied science research, 2024-12, Vol.14 (6), p.18662-18669
Hauptverfasser: Badawood, Donia Y., Aldosari, Fahd M.
Format: Artikel
Sprache:eng
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Emotion recognition from speech is crucial for advancing human-computer interactions, enabling more natural and empathetic communication. This study proposes a novel Speech Emotion Recognition (SER) framework that integrates Convolutional Neural Networks (CNNs) and transformer-based architectures to capture local and contextual speech features. The model demonstrates strong classification performance, particularly for prominent emotions such as anger, sadness, and happiness. However, challenges persist in detecting less frequent emotions, such as surprise and calm, highlighting areas for improvement. The limitations of current datasets, such as limited linguistic diversity, are discussed. The findings underscore the model's robustness and identify avenues for future enhancement, such as incorporating more diverse datasets and employing techniques such as transfer learning. Future work will explore multimodal approaches and real-time implementation on edge devices to improve the system's adaptability in real-world scenarios.
ISSN:2241-4487
1792-8036
DOI:10.48084/etasr.9229