Novelty detection improves performance of reinforcement learners in fluctuating, partially observable environments

•In fluctuating environments, reward optimizers can get stuck in suboptimal saddle points, gathering suboptimal reward.•It is relatively easy to build a novelty detector to recognize when the environment has changed.•Reward optimizers with novelty detectors do better than those without, by reinitial...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of theoretical biology 2019-09, Vol.477, p.44-50
1. Verfasser:	Marzen, Sarah E.
Format:	Artikel
Sprache:	eng
Schlagworte:	Fading memory Novelty detection Reinforcement learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•In fluctuating environments, reward optimizers can get stuck in suboptimal saddle points, gathering suboptimal reward.•It is relatively easy to build a novelty detector to recognize when the environment has changed.•Reward optimizers with novelty detectors do better than those without, by reinitializing weights when the environment changes. Evolved and engineered organisms must adapt to fluctuating environments that are often only partially observed. We show that adaptation to a second environment can be significantly harder after adapting to a first, completely unrelated environment, even when using second-order learning algorithms and a constant learning rate. In effect, there is a lack of fading memory in the organism’s performance. However, organisms can adapt well to the second environment by incorporating a simple novelty detection algorithm that signals when the environment has changed and reinitializing the parameters that define their behavior if so. We propose that it may be fruitful to look for signs of this novelty detection in biological organisms, and to engineer novelty detection algorithms into artificial organisms.
ISSN:	0022-5193 1095-8541
DOI:	10.1016/j.jtbi.2019.06.007