Mesolimbic dopamine adapts the rate of learning from action

Recent success in training artificial agents and robots derives from a combination of direct learning of behavioural policies and indirect learning through value functions 1 – 3 . Policy learning and value learning use distinct algorithms that optimize behavioural performance and reward prediction,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Nature (London) 2023-02, Vol.614 (7947), p.294-302
Hauptverfasser:	Coddington, Luke T., Lindo, Sarah E., Dudman, Joshua T.
Format:	Artikel
Sprache:	eng
Schlagworte:	14/35 631/378/116/2396 631/378/1595/1395 631/378/1788 631/378/87 64/60 9/10 Agents (artificial intelligence) Algorithms Animal behavior Animal models Animals Behavior, Animal Conditioning, Psychological Cues Datasets as Topic Dopamine Dopamine - metabolism Dopamine receptors Error signals Generalized linear models Head Humanities and Social Sciences Hypotheses Learning Machine learning Mesolimbic system Mice Movement multidisciplinary Neural networks Neural Networks, Computer Neural Pathways Policies Reinforcement Reinforcement, Psychology Reward Science Science (multidisciplinary) Success Teaching methods Trace conditioning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recent success in training artificial agents and robots derives from a combination of direct learning of behavioural policies and indirect learning through value functions 1 – 3 . Policy learning and value learning use distinct algorithms that optimize behavioural performance and reward prediction, respectively. In animals, behavioural learning and the role of mesolimbic dopamine signalling have been extensively evaluated with respect to reward prediction 4 ; however, so far there has been little consideration of how direct policy learning might inform our understanding 5 . Here we used a comprehensive dataset of orofacial and body movements to understand how behavioural policies evolved as naive, head-restrained mice learned a trace conditioning paradigm. Individual differences in initial dopaminergic reward responses correlated with the emergence of learned behavioural policy, but not the emergence of putative value encoding for a predictive cue. Likewise, physiologically calibrated manipulations of mesolimbic dopamine produced several effects inconsistent with value learning but predicted by a neural-network-based model that used dopamine signals to set an adaptive rate, not an error signal, for behavioural policy learning. This work provides strong evidence that phasic dopamine activity can regulate direct learning of behavioural policies, expanding the explanatory power of reinforcement learning models for animal learning 6 . Analysis of data collected from mice learning a trace conditioning paradigm shows that phasic dopamine activity in the brain can regulate direct learning of behavioural policies, and dopamine sets an adaptive learning rate rather than an error-like teaching signal.
ISSN:	0028-0836 1476-4687
DOI:	10.1038/s41586-022-05614-z