Data-driven models for timing feedback responses in a Map Task dialogue system

•We train models for detecting suitable feedback locations in the user's speech.•We evaluated the trained model in a dialogue system through real user interactions.•We exploit prosodic, lexico-syntactic and contextual cues for online detection. Traditional dialogue systems use a fixed silence t...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer speech & language 2014-07, Vol.28 (4), p.903-922
Hauptverfasser:	Meena, Raveesh, Skantze, Gabriel, Gustafson, Joakim
Format:	Artikel
Sprache:	eng
Schlagworte:	Applied linguistics Computational linguistics Linguistics Mathematics and linguistics Quantitative studies Spoken dialogue systems Timing feedback Turn-taking User evaluation
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•We train models for detecting suitable feedback locations in the user's speech.•We evaluated the trained model in a dialogue system through real user interactions.•We exploit prosodic, lexico-syntactic and contextual cues for online detection. Traditional dialogue systems use a fixed silence threshold to detect the end of users’ turns. Such a simplistic model can result in system behaviour that is both interruptive and unresponsive, which in turn affects user experience. Various studies have observed that human interlocutors take cues from speaker behaviour, such as prosody, syntax, and gestures, to coordinate smooth exchange of speaking turns. However, little effort has been made towards implementing these models in dialogue systems and verifying how well they model the turn-taking behaviour in human–computer interactions. We present a data-driven approach to building models for online detection of suitable feedback response locations in the user's speech. We first collected human–computer interaction data using a spoken dialogue system that can perform the Map Task with users (albeit using a trick). On this data, we trained various models that use automatically extractable prosodic, contextual and lexico-syntactic features for detecting response locations. Next, we implemented a trained model in the same dialogue system and evaluated it in interactions with users. The subjective and objective measures from the user evaluation confirm that a model trained on speaker behavioural cues offers both smoother turn-transitions and more responsive system behaviour.
ISSN:	0885-2308 1095-8363 1095-8363
DOI:	10.1016/j.csl.2014.02.002