KddRES: A Multi-level Knowledge-driven Dialogue Dataset for Restaurant Towards Customized Dialogue System

To alleviate the shortage of dialogue datasets for Cantonese, one of the low-resource languages, and facilitate the development of customized task-oriented dialogue systems, we propose KddRES, the first Cantonese Knowledge-driven dialogue dataset for REStaurants. It contains 834 multi-turn dialogues...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer speech & language 2024-08, Vol.87, p.101637, Article 101637
Hauptverfasser:	Wang, Hongru, Kwan, Wai-Chung, Li, Min, Zhou, Zimo, Wong, Kam-Fai
Format:	Artikel
Sprache:	eng
Schlagworte:	Cantonese Customized Dialogue System Hierarchical Slots Low-resource language Task-oriented Dialogue System
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	To alleviate the shortage of dialogue datasets for Cantonese, one of the low-resource languages, and facilitate the development of customized task-oriented dialogue systems, we propose KddRES, the first Cantonese Knowledge-driven dialogue dataset for REStaurants. It contains 834 multi-turn dialogues, 8000 utterances, and 26 distinct slots. The slots are hierarchical, and beneath the 26 coarse-grained slots are the additional 16 fine-grained slots. Annotations of dialogue states and dialogue actions at both the user and system sides are provided to suit multiple downstream tasks such as natural language understanding and dialogue state tracking. To effectively detect hierarchical slots, we propose a framework HierBERT by modelling label semantics and relationships between different slots. Experimental results demonstrate that KddRES is more challenging compared with existing datasets due to the introduction of hierarchical slots and our framework is particularly effective in detecting secondary slots and achieving a new state-of-the-art. Given the rich annotation and hierarchical slot structure of KddRES, we hope it will promote research on the development of customized dialogue systems in Cantonese and other conversational AI tasks, such as dialogue state tracking and policy learning. •We build the first Cantonese human-to-human task-oriented dialogue dataset: KddRES.•It contains 834 dialogues with around 8000 utterances and 26 hierarchical slots.•We provide extensive annotations at both the user and system sides.•We propose HierBERT, a framework utilizes hierarchical label semantics based on BERT.•We achieves new state-of-the-art results on natural language understanding tasks.
ISSN:	0885-2308 1095-8363
DOI:	10.1016/j.csl.2024.101637