Adversarial learning of neural user simulators for dialogue policy optimisation
Reinforcement learning based dialogue policies are typically trained in interaction with a user simulator. To obtain an effective and robust policy, this simulator should generate user behaviour that is both realistic and varied. Current data-driven simulators are trained to accurately model the use...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Reinforcement learning based dialogue policies are typically trained in
interaction with a user simulator. To obtain an effective and robust policy,
this simulator should generate user behaviour that is both realistic and
varied. Current data-driven simulators are trained to accurately model the user
behaviour in a dialogue corpus. We propose an alternative method using
adversarial learning, with the aim to simulate realistic user behaviour with
more variation. We train and evaluate several simulators on a corpus of
restaurant search dialogues, and then use them to train dialogue system
policies. In policy cross-evaluation experiments we demonstrate that an
adversarially trained simulator produces policies with 8.3% higher success rate
than those trained with a maximum likelihood simulator. Subjective results from
a crowd-sourced dialogue system user evaluation confirm the effectiveness of
adversarially training user simulators. |
---|---|
DOI: | 10.48550/arxiv.2306.00858 |