Personality trait estimation in group discussions using multimodal analysis and speaker embedding

The automatic estimation of personality traits is essential for many human–computer interface (HCI) applications. This paper focused on improving Big Five personality trait estimation in group discussions via multimodal analysis and transfer learning with the state-of-the-art speaker individuality f...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal on multimodal user interfaces 2023-06, Vol.17 (2), p.47-63
Hauptverfasser:	Mawalim, Candy Olivia, Okada, Shogo, Nakano, Yukiko I., Unoki, Masashi
Format:	Artikel
Sprache:	eng
Schlagworte:	Ablation Computer Science Embedding Human-computer interface Image Processing and Computer Vision Original Paper Personality Personality traits Signal,Image and Speech Processing User Interfaces and Human Computer Interaction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The automatic estimation of personality traits is essential for many human–computer interface (HCI) applications. This paper focused on improving Big Five personality trait estimation in group discussions via multimodal analysis and transfer learning with the state-of-the-art speaker individuality feature, namely, the identity vector (i-vector) speaker embedding. The experiments were carried out by investigating the effective and robust multimodal features for estimation with two group discussion datasets, i.e., the Multimodal Task-Oriented Group Discussion (MATRICS) (in Japanese) and Emergent Leadership (ELEA) (in European languages) corpora. Subsequently, the evaluation was conducted by using leave-one-person-out cross-validation (LOPCV) and ablation tests to compare the effectiveness of each modality. The overall results showed that the speaker-dependent features, e.g., the i-vector, effectively improved the prediction accuracy of Big Five personality trait estimation. In addition, the experimental results showed that audio-related features were the most prominent features in both corpora.
ISSN:	1783-7677 1783-8738
DOI:	10.1007/s12193-023-00401-0