Exploiting Language Instructions for Interpretable and Compositional Reinforcement Learning
In this work, we present an alternative approach to making an agent compositional through the use of a diagnostic classifier. Because of the need for explainable agents in automated decision processes, we attempt to interpret the latent space from an RL agent to identify its current objective in a c...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this work, we present an alternative approach to making an agent
compositional through the use of a diagnostic classifier. Because of the need
for explainable agents in automated decision processes, we attempt to interpret
the latent space from an RL agent to identify its current objective in a
complex language instruction. Results show that the classification process
causes changes in the hidden states which makes them more easily interpretable,
but also causes a shift in zero-shot performance to novel instructions. Lastly,
we limit the supervisory signal on the classification, and observe a similar
but less notable effect. |
---|---|
DOI: | 10.48550/arxiv.2001.04418 |