Novelty detection improves performance of reinforcement learners in fluctuating, partially observable environments
•In fluctuating environments, reward optimizers can get stuck in suboptimal saddle points, gathering suboptimal reward.•It is relatively easy to build a novelty detector to recognize when the environment has changed.•Reward optimizers with novelty detectors do better than those without, by reinitial...
Gespeichert in:
Veröffentlicht in: | Journal of theoretical biology 2019-09, Vol.477, p.44-50 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •In fluctuating environments, reward optimizers can get stuck in suboptimal saddle points, gathering suboptimal reward.•It is relatively easy to build a novelty detector to recognize when the environment has changed.•Reward optimizers with novelty detectors do better than those without, by reinitializing weights when the environment changes.
Evolved and engineered organisms must adapt to fluctuating environments that are often only partially observed. We show that adaptation to a second environment can be significantly harder after adapting to a first, completely unrelated environment, even when using second-order learning algorithms and a constant learning rate. In effect, there is a lack of fading memory in the organism’s performance. However, organisms can adapt well to the second environment by incorporating a simple novelty detection algorithm that signals when the environment has changed and reinitializing the parameters that define their behavior if so. We propose that it may be fruitful to look for signs of this novelty detection in biological organisms, and to engineer novelty detection algorithms into artificial organisms. |
---|---|
ISSN: | 0022-5193 1095-8541 |
DOI: | 10.1016/j.jtbi.2019.06.007 |