SEGAA: A Unified Approach to Predicting Age, Gender, and Emotion in Speech
The interpretation of human voices holds importance across various applications. This study ventures into predicting age, gender, and emotion from vocal cues, a field with vast applications. Voice analysis tech advancements span domains, from improving customer interactions to enhancing healthcare a...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The interpretation of human voices holds importance across various
applications. This study ventures into predicting age, gender, and emotion from
vocal cues, a field with vast applications. Voice analysis tech advancements
span domains, from improving customer interactions to enhancing healthcare and
retail experiences. Discerning emotions aids mental health, while age and
gender detection are vital in various contexts. Exploring deep learning models
for these predictions involves comparing single, multi-output, and sequential
models highlighted in this paper. Sourcing suitable data posed challenges,
resulting in the amalgamation of the CREMA-D and EMO-DB datasets. Prior work
showed promise in individual predictions, but limited research considered all
three variables simultaneously. This paper identifies flaws in an individual
model approach and advocates for our novel multi-output learning architecture
Speech-based Emotion Gender and Age Analysis (SEGAA) model. The experiments
suggest that Multi-output models perform comparably to individual models,
efficiently capturing the intricate relationships between variables and speech
inputs, all while achieving improved runtime. |
---|---|
DOI: | 10.48550/arxiv.2403.00887 |