VISITOR: Visual Interactive State Sequence Exploration for Reinforcement Learning

Understanding the behavior of deep reinforcement learning agents is a crucial requirement throughout their development. Existing work has addressed the identification of observable behavioral patterns in state sequences or analysis of isolated internal representations; however, the overall decision‐...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer graphics forum 2023-06, Vol.42 (3), p.397-408
Hauptverfasser:	Metz, Yannick, Bykovets, Eugene, Joos, Lucas, Keim, Daniel, El‐Assady, Mennatallah
Format:	Artikel
Sprache:	eng
Schlagworte:	Annotations CCS Concepts Computing methodologies → Reinforcement learning Decision analysis Deep learning Embedding Human‐centered computing → Visual analytics Multiscale analysis Sequences
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Understanding the behavior of deep reinforcement learning agents is a crucial requirement throughout their development. Existing work has addressed the identification of observable behavioral patterns in state sequences or analysis of isolated internal representations; however, the overall decision‐making of deep‐learning RL agents remains opaque. To tackle this, we present VISITOR, a visual analytics system enabling the analysis of entire state sequences, the diagnosis of singular predictions, and the comparison between agents. A sequence embedding view enables the multiscale analysis of state sequences, utilizing custom embedding techniques for a stable spatialization of the observations and internal states. We provide multiple layers: (1) a state space embedding, highlighting different groups of states inside the state‐action sequences, (2) a trajectory view, emphasizing decision points, (3) a network activation mapping, visualizing the relationship between observations and network activations, (4) a transition embedding, enabling the analysis of state‐to‐state transitions. The embedding view is accompanied by an interactive reward view that captures the temporal development of metrics, which can be linked directly to states in the embedding. Lastly, a model list allows for the quick comparison of models across multiple metrics. Annotations can be exported to communicate results to different audiences. Our two‐stage evaluation with eight experts confirms the effectiveness in identifying states of interest, comparing the quality of policies, and reasoning about the internal decision‐making processes.
ISSN:	0167-7055 1467-8659
DOI:	10.1111/cgf.14839