Continual Auxiliary Task Learning
Learning auxiliary tasks, such as multiple predictions about the world, can provide many benefits to reinforcement learning systems. A variety of off-policy learning algorithms have been developed to learn such predictions, but as yet there is little work on how to adapt the behavior to gather usefu...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Learning auxiliary tasks, such as multiple predictions about the world, can
provide many benefits to reinforcement learning systems. A variety of
off-policy learning algorithms have been developed to learn such predictions,
but as yet there is little work on how to adapt the behavior to gather useful
data for those off-policy predictions. In this work, we investigate a
reinforcement learning system designed to learn a collection of auxiliary
tasks, with a behavior policy learning to take actions to improve those
auxiliary predictions. We highlight the inherent non-stationarity in this
continual auxiliary task learning problem, for both prediction learners and the
behavior learner. We develop an algorithm based on successor features that
facilitates tracking under non-stationary rewards, and prove the separation
into learning successor features and rewards provides convergence rate
improvements. We conduct an in-depth study into the resulting multi-prediction
learning system. |
---|---|
DOI: | 10.48550/arxiv.2202.11133 |