Multi-modal emotion recognition through adaptive normalization fusion with alpha Gaussian dropout in MCNN architecture

Emotion recognition through the fusion of multi-modal sources, including voice and visual cues, holds paramount importance in comprehending human affective states. This research introduces an inventive paradigm, seamlessly uniting the potential of adaptive normalization fusion (ANF) and alpha Gaussi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal, image and video processing image and video processing, 2024-03, Vol.18 (2), p.1779-1791
Hauptverfasser: Murugesan, M., Dhivya, P., Rajesh Kanna, P., Sathish Kumar, G.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Emotion recognition through the fusion of multi-modal sources, including voice and visual cues, holds paramount importance in comprehending human affective states. This research introduces an inventive paradigm, seamlessly uniting the potential of adaptive normalization fusion (ANF) and alpha Gaussian dropout (AGD) within the architecture of a multi-channel convolutional neural network (MCNN). ANF employs a sequence of pre-processing steps, beginning with batch normalization, followed by min–max scaling using the sigmoid function, and further augmented by alpha Gaussian dropout (AGD)—a fusion of Gaussian dropout and alpha dropout. These techniques are seamlessly integrated into multi-channel convolutional neural network (MCNN) architecture, aimed at improving emotion classification accuracy. This synergy fortifies the network's robust generalization capacities while effectively mitigating overfitting concerns. Rigorous experimentation and comprehensive evaluations across datasets that encompass voice and visual-based emotion recognition underscore the remarkable efficiency of the proposed methodology, culminating in notable strides in accuracy and generalization. The culmination of this research delivers a notable achievement, reflecting an impressive 97% accuracy in face emotion recognition and an exceptional 98% accuracy in speech emotion recognition.
ISSN:1863-1703
1863-1711
DOI:10.1007/s11760-023-02847-x