A Study of State Aliasing in Structured Prediction with RNNs
End-to-end reinforcement learning agents learn a state representation and a policy at the same time. Recurrent neural networks (RNNs) have been trained successfully as reinforcement learning agents in settings like dialogue that require structured prediction. In this paper, we investigate the repres...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | End-to-end reinforcement learning agents learn a state representation and a
policy at the same time. Recurrent neural networks (RNNs) have been trained
successfully as reinforcement learning agents in settings like dialogue that
require structured prediction. In this paper, we investigate the
representations learned by RNN-based agents when trained with both policy
gradient and value-based methods. We show through extensive experiments and
analysis that, when trained with policy gradient, recurrent neural networks
often fail to learn a state representation that leads to an optimal policy in
settings where the same action should be taken at different states. To explain
this failure, we highlight the problem of state aliasing, which entails
conflating two or more distinct states in the representation space. We
demonstrate that state aliasing occurs when several states share the same
optimal action and the agent is trained via policy gradient. We characterize
this phenomenon through experiments on a simple maze setting and a more complex
text-based game, and make recommendations for training RNNs with reinforcement
learning. |
---|---|
DOI: | 10.48550/arxiv.1906.09310 |