Enhanced Deep Learning Techniques for Real-Time Speech Emotion Recognition in Multilingual Contexts
Emotion recognition from speech is crucial for advancing human-computer interactions, enabling more natural and empathetic communication. This study proposes a novel Speech Emotion Recognition (SER) framework that integrates Convolutional Neural Networks (CNNs) and transformer-based architectures to...
Gespeichert in:
Veröffentlicht in: | Engineering, technology & applied science research technology & applied science research, 2024-12, Vol.14 (6), p.18662-18669 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Emotion recognition from speech is crucial for advancing human-computer interactions, enabling more natural and empathetic communication. This study proposes a novel Speech Emotion Recognition (SER) framework that integrates Convolutional Neural Networks (CNNs) and transformer-based architectures to capture local and contextual speech features. The model demonstrates strong classification performance, particularly for prominent emotions such as anger, sadness, and happiness. However, challenges persist in detecting less frequent emotions, such as surprise and calm, highlighting areas for improvement. The limitations of current datasets, such as limited linguistic diversity, are discussed. The findings underscore the model's robustness and identify avenues for future enhancement, such as incorporating more diverse datasets and employing techniques such as transfer learning. Future work will explore multimodal approaches and real-time implementation on edge devices to improve the system's adaptability in real-world scenarios. |
---|---|
ISSN: | 2241-4487 1792-8036 |
DOI: | 10.48084/etasr.9229 |