Emergence of belief-like representations through reinforcement learning

To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PLoS computational biology 2023-09, Vol.19 (9), p.e1011067
Hauptverfasser:	Hennig, Jay A, Romero Pinto, Sandra A, Yamaguchi, Takahiro, Linderman, Scott W, Uchida, Naoshige, Gershman, Samuel J
Format:	Artikel
Sprache:	eng
Schlagworte:	Analysis Animal models Animals Bayes Theorem Bayesian analysis Biology and Life Sciences Computer and Information Sciences Dopamine Dynamical systems Explicit knowledge Learning Learning in animals Mathematical models Medicine and Health Sciences Neural networks Neural Networks, Computer Observability (systems) Physical Sciences Random variables Recurrent neural networks Reinforcement Reinforcement, Psychology Representations Reward Social Sciences
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	To behave adaptively, animals must learn to predict future reward, or value. To do this, animals are thought to learn reward predictions using reinforcement learning. However, in contrast to classical models, animals must learn to estimate value using only incomplete state information. Previous work suggests that animals estimate value in partially observable tasks by first forming "beliefs"-optimal Bayesian estimates of the hidden states in the task. Although this is one way to solve the problem of partial observability, it is not the only way, nor is it the most computationally scalable solution in complex, real-world environments. Here we show that a recurrent neural network (RNN) can learn to estimate value directly from observations, generating reward prediction errors that resemble those observed experimentally, without any explicit objective of estimating beliefs. We integrate statistical, functional, and dynamical systems perspectives on beliefs to show that the RNN's learned representation encodes belief information, but only when the RNN's capacity is sufficiently large. These results illustrate how animals can estimate value in tasks without explicitly estimating beliefs, yielding a representation useful for systems with limited capacity.
ISSN:	1553-7358 1553-734X 1553-7358
DOI:	10.1371/journal.pcbi.1011067