A task-performance evaluation of referring expressions in situated collaborative task dialogues

Appropriate evaluation of referring expressions is critical for the design of systems that can effectively collaborate with humans. A widely used method is to simply evaluate the degree to which an algorithm can reproduce the same expressions as those in previously collected corpora. Several researc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Language Resources and Evaluation 2013-12, Vol.47 (4), p.1285-1304
Hauptverfasser:	Spanger, Philipp, Iida, Ryu, Tokunaga, Takenobu, Terai, Asuka, Kuriyama, Naoko
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Applied linguistics Artificial intelligence Collaboration Computational Linguistics Computer Science Demonstrative pronouns Demonstratives Japanese language Language and Literature Linguistics Literary dialogue Machine learning Matching Natural language Original Paper Performance evaluation Reference (Semantic) Referents Referring expressions Social Sciences System effectiveness Task analysis Tuna
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Appropriate evaluation of referring expressions is critical for the design of systems that can effectively collaborate with humans. A widely used method is to simply evaluate the degree to which an algorithm can reproduce the same expressions as those in previously collected corpora. Several researchers, however, have noted the need of a task-performance evaluation measuring the effectiveness of a referring expression in the achievement of a given task goal. This is particularly important in collaborative situated dialogues. Using referring expressions used by six pairs of Japanese speakers collaboratively solving Tangram puzzles, we conducted a task-performance evaluation of referring expressions with 36 human evaluators. Particularly we focused on the evaluation of demonstrative pronouns generated by a machine learning-based algorithm. Comparing the results of this task-performance evaluation with the results of a previously conducted corpus-matching evaluation (Spanger et al. in Lang Resour Eval, 2010b), we confirmed the limitation of a corpusmatching evaluation and discuss the need for a task-performance evaluation.
ISSN:	1574-020X 1572-8412 1574-0218
DOI:	10.1007/s10579-013-9240-5