Towards Symmetric Multimodality: Fusion and Fission of Speech, Gesture, and Facial Expression

We introduce the notion of symmetric multimodality for dialogue systems in which all input modes (eg. speech, gesture, facial expression) are also available for output, and vice versa. A dialogue system with symmetric multimodality must not only understand and represent the user’s multimodal input,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Wahlster, Wolfgang
Format:	Buchkapitel
Sprache:	eng
Schlagworte:	Applied sciences Artificial intelligence Computer science control theory systems Computer systems and distributed systems. User interface Conversational Agent Deictic Gesture Dialogue System Emotional Prosody Exact sciences and technology Learning and adaptive systems Presentation Planner Software
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We introduce the notion of symmetric multimodality for dialogue systems in which all input modes (eg. speech, gesture, facial expression) are also available for output, and vice versa. A dialogue system with symmetric multimodality must not only understand and represent the user’s multimodal input, but also its own multimodal output. We present the SmartKom system, that provides full symmetric multimodality in a mixed-initiative dialogue system with an embodied conversational agent. SmartKom represents a new generation of multimodal dialogue systems, that deal not only with simple modality integration and synchronization, but cover the full spectrum of dialogue phenomena that are associated with symmetric multimodality (including crossmodal references, one-anaphora, and backchannelling). We show that SmartKom’s plug-an-play architecture supports multiple recognizers for a single modality, eg. the user’s speech signal can be processed by three unimodal recognizers in parallel (speech recognition, emotional prosody, boundary prosody). Finally, we detail SmartKom’s three-tiered representation of multimodal discourse, consisting of a domain layer, a discourse layer, and a modality layer.
ISSN:	0302-9743 1611-3349
DOI:	10.1007/978-3-540-39451-8_1