Does phasic dopamine release cause policy updates?

Phasic dopamine activity is believed to both encode reward‐prediction errors (RPEs) and to cause the adaptations that these errors engender. If so, a rat working for optogenetic stimulation of dopamine neurons will repeatedly update its policy and/or action values, thus iteratively increasing its wo...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The European journal of neuroscience 2024-03, Vol.59 (6), p.1260-1277
Hauptverfasser: Carter, Francis, Cossette, Marie‐Pierre, Trujillo‐Pisanty, Ivan, Pallikaras, Vasilios, Breton, Yannick‐André, Conover, Kent, Caplan, Jill, Solis, Pavel, Voisard, Jacques, Yaksich, Alexandra, Shizgal, Peter
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Phasic dopamine activity is believed to both encode reward‐prediction errors (RPEs) and to cause the adaptations that these errors engender. If so, a rat working for optogenetic stimulation of dopamine neurons will repeatedly update its policy and/or action values, thus iteratively increasing its work rate. Here, we challenge this view by demonstrating stable, non‐maximal work rates in the face of repeated optogenetic stimulation of midbrain dopamine neurons. Furthermore, we show that rats learn to discriminate between world states distinguished only by their history of dopamine activation. Comparison of these results to reinforcement learning simulations suggests that the induced dopamine transients acted more as rewards than RPEs. However, pursuit of dopaminergic stimulation drifted upwards over a time scale of days and weeks, despite its stability within trials. To reconcile the results with prior findings, we consider multiple roles for dopamine signalling. Phasic dopamine (DA) firing is believed to encode reward‐prediction errors (RPEs) that drive learning. If so, a rat working for optogenetic stimulation of DA neurons will repeatedly update its action values, thus iteratively increasing its work rate (right). Instead, work rates stabilised for many minutes (left). These results are congruent with simulations in which DA bursts act as rewards (centre) and differ sharply from simulations in which DA bursts act as RPEs (right).
ISSN:0953-816X
1460-9568
DOI:10.1111/ejn.16199