Variational Denoising Autoencoders and Least-Squares Policy Iteration for Statistical Dialogue Managers

The use of Reinforcement Learning (RL) approaches for dialogue policy optimization has been the new trend for dialogue management systems. Several methods have been proposed, which are trained on dialogue data to provide optimal system response. However, most of these approaches exhibit performance...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE signal processing letters 2020, Vol.27, p.960-964
Hauptverfasser:	Diakoloukas, Vassilios, Lygerakis, Fotios, Lagoudakis, Michail G., Kotti, Margarita
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Approximation algorithms Degradation denoising dialogue systems Encoding Least squares least-squares policy iteration Management systems Noise Noise reduction Optimization Performance degradation sample-efficient statistical dialogue managers Signal processing algorithms Training Variational autoencoders
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The use of Reinforcement Learning (RL) approaches for dialogue policy optimization has been the new trend for dialogue management systems. Several methods have been proposed, which are trained on dialogue data to provide optimal system response. However, most of these approaches exhibit performance degradation in the presence of noise, poor scalability to other domains, as well as performance instabilities. To overcome these problems, we propose a novel approach based on the incremental, sample-efficient Least-Squares Policy Iteration (LSPI) algorithm, which is trained on compact, fixed-size dialogue state encodings, obtained from deep Variational Denoising Autoencoders (VDAE). The proposed scheme exhibits stable and noise-robust performance, which significantly outperforms the current state-of-the-art, even in mismatched noise environments.
ISSN:	1070-9908 1558-2361
DOI:	10.1109/LSP.2020.2998361