An audio-visual dataset of human–human interactions in stressful situations

Stressful situations are likely to occur at human operated service desks, as well as at human–computer interfaces used in public domain. Automatic surveillance can help notifying when extra assistance is needed. Human communication is inherently multimodal e.g. speech, gestures, facial expressions....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal on multimodal user interfaces 2014-03, Vol.8 (1), p.29-41
Hauptverfasser: Lefter, Iulia, Burghouts, Gertjan J., Rothkrantz, Leon J. M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Stressful situations are likely to occur at human operated service desks, as well as at human–computer interfaces used in public domain. Automatic surveillance can help notifying when extra assistance is needed. Human communication is inherently multimodal e.g. speech, gestures, facial expressions. It is expected that automatic surveillance systems can benefit from exploiting multimodal information. This requires automatic fusion of modalities, which is still an unsolved problem. To support the development of such systems, we present and analyze audio-visual recordings of human–human interactions at a service desk. The corpus has a high degree of realism: all interactions are freely improvised by actors based on short scenarios where only the sources of conflict were provided. The recordings can be considered as a prototype for general stressful human–human interaction. The recordings were annotated on a 5 point scale on degree of stress from the perspective of surveillance operators. The recordings are very rich in hand gestures. We find that the more stressful the situation, the higher the proportion of speech that is accompanied by gestures. Understanding the function of gestures and their relation to speech is essential for good fusion strategies. Taking speech as the basic modality, one of our research questions was, what is the role of gestures in addition to speech. Both speech and gestures can express emotion, so we say that they have an emotional function. They can also express non-emotional information, in which case we say that they have a semantic function. We learn that when speech and gestures have the same function, they are usually congruent, but intensities and clarity can vary. Most gestures in this dataset convey emotion. We identify classes of gestures in our recordings, and argue that some classes are clear indications of stressful situations
ISSN:1783-7677
1783-8738
DOI:10.1007/s12193-014-0150-7