A Combined Rule-Based & Machine Learning Audio-Visual Emotion Recognition Approach

This paper proposes an audio-visual emotion recognition system that uses a mixture of rule-based and machine learning techniques to improve the recognition efficacy in the audio and video paths. The visual path is designed using the Bi-directional Principal Component Analysis (BDPCA) and Least-Squar...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on affective computing 2018-01, Vol.9 (1), p.3-13
Hauptverfasser:	Seng, Kah Phooi, Ang, Li-Minn, Ooi, Chien Shing
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence audio-visual processing Audio-visual systems Basis functions Discriminant analysis Emotion recognition Feature extraction Machine learning Modules multimodal system Principal component analysis Principal components analysis Radial basis function rule-based Visual discrimination Visual signals
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper proposes an audio-visual emotion recognition system that uses a mixture of rule-based and machine learning techniques to improve the recognition efficacy in the audio and video paths. The visual path is designed using the Bi-directional Principal Component Analysis (BDPCA) and Least-Square Linear Discriminant Analysis (LSLDA) for dimensionality reduction and discrimination. The extracted visual features are passed into a newly designed Optimized Kernel-Laplacian Radial Basis Function (OKL-RBF) neural classifier. The audio path is designed using a combination of input prosodic features (pitch, log-energy, zero crossing rates and Teager energy operator) and spectral features (Mel-scale frequency cepstral coefficients). The extracted audio features are passed into an audio feature level fusion module that uses a set of rules to determine the most likely emotion contained in the audio signal. An audio visual fusion module fuses outputs from both paths. The performances of the proposed audio path, visual path, and the final system are evaluated on standard databases. Experiment results and comparisons reveal the good performance of the proposed system.
ISSN:	1949-3045 1949-3045
DOI:	10.1109/TAFFC.2016.2588488