The role of state uncertainty in the dynamics of dopamine

Reinforcement learning models of the basal ganglia map the phasic dopamine signal to reward prediction errors (RPEs). Conventional models assert that, when a stimulus predicts a reward with fixed delay, dopamine activity during the delay should converge to baseline through learning. However, recent...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Current biology 2022-03, Vol.32 (5), p.1077-1087.e9
Hauptverfasser: Mikhael, John G., Kim, HyungGoo R., Uchida, Naoshige, Gershman, Samuel J.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Reinforcement learning models of the basal ganglia map the phasic dopamine signal to reward prediction errors (RPEs). Conventional models assert that, when a stimulus predicts a reward with fixed delay, dopamine activity during the delay should converge to baseline through learning. However, recent studies have found that dopamine ramps up before reward in certain conditions even after learning, thus challenging the conventional models. In this work, we show that sensory feedback causes an unbiased learner to produce RPE ramps. Our model predicts that when feedback gradually decreases during a trial, dopamine activity should resemble a “bump,” whose ramp-up phase should, furthermore, be greater than that of conditions where the feedback stays high. We trained mice on a virtual navigation task with varying brightness, and both predictions were empirically observed. In sum, our theoretical and experimental results reconcile the seemingly conflicting data on dopamine behaviors under the RPE hypothesis. •Dopamine (DA) ramps have challenged the reward prediction error (RPE) hypothesis•We provide a normative theory on how RPEs can ramp up in a task-dependent manner•Sensory feedback causes RPEs to ramp up over the course of a trial•Gradually weakening sensory feedback caused a DA “bump” as our model predicts Dopamine serves as a “reward prediction error” (RPE) that facilitates learning. Mikhael et al. argue that in the presence of sensory feedback, an unbiased learner will produce RPE ramps. This view predicts a previously unobserved dopamine behavior, a dopamine “bump,” which is empirically validated using a virtual reality task in mice.
ISSN:0960-9822
1879-0445
DOI:10.1016/j.cub.2022.01.025