Fusion in Context: A Multimodal Approach to Affective State Recognition
Accurate recognition of human emotions is a crucial challenge in affective computing and human-robot interaction (HRI). Emotional states play a vital role in shaping behaviors, decisions, and social interactions. However, emotional expressions can be influenced by contextual factors, leading to misi...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Accurate recognition of human emotions is a crucial challenge in affective
computing and human-robot interaction (HRI). Emotional states play a vital role
in shaping behaviors, decisions, and social interactions. However, emotional
expressions can be influenced by contextual factors, leading to
misinterpretations if context is not considered. Multimodal fusion, combining
modalities like facial expressions, speech, and physiological signals, has
shown promise in improving affect recognition. This paper proposes a
transformer-based multimodal fusion approach that leverages facial thermal
data, facial action units, and textual context information for context-aware
emotion recognition. We explore modality-specific encoders to learn tailored
representations, which are then fused using additive fusion and processed by a
shared transformer encoder to capture temporal dependencies and interactions.
The proposed method is evaluated on a dataset collected from participants
engaged in a tangible tabletop Pacman game designed to induce various affective
states. Our results demonstrate the effectiveness of incorporating contextual
information and multimodal fusion for affective state recognition. |
---|---|
DOI: | 10.48550/arxiv.2409.11906 |