The asymmetric learning rates of murine exploratory behavior in sparse reward environments

Goal-oriented behaviors of animals can be modeled by reinforcement learning algorithms. Such algorithms predict future outcomes of selected actions utilizing action values and updating those values in response to the positive and negative outcomes. In many models of animal behavior, the action value...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neural networks 2021-11, Vol.143, p.218-229
Hauptverfasser:	Ohta, Hiroyuki, Satori, Kuniaki, Takarada, Yu, Arake, Masashi, Ishizuka, Toshiaki, Morimoto, Yuji, Takahashi, Tatsuji
Format:	Artikel
Sprache:	eng
Schlagworte:	Behavior Dual learning rate Exploration Multi-armed bandit problem Reinforcement learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	229
container_issue
container_start_page	218
container_title	Neural networks
container_volume	143
creator	Ohta, Hiroyuki Satori, Kuniaki Takarada, Yu Arake, Masashi Ishizuka, Toshiaki Morimoto, Yuji Takahashi, Tatsuji
description	Goal-oriented behaviors of animals can be modeled by reinforcement learning algorithms. Such algorithms predict future outcomes of selected actions utilizing action values and updating those values in response to the positive and negative outcomes. In many models of animal behavior, the action values are updated symmetrically based on a common learning rate, that is, in the same way for both positive and negative outcomes. However, animals in environments with scarce rewards may have uneven learning rates. To investigate the asymmetry in learning rates in reward and non-reward, we analyzed the exploration behavior of mice in five-armed bandit tasks using a Q-learning model with differential learning rates for positive and negative outcomes. The positive learning rate was significantly higher in a scarce reward environment than in a rich reward environment, and conversely, the negative learning rate was significantly lower in the scarce environment. The positive to negative learning rate ratio was about 10 in the scarce environment and about 2 in the rich environment. This result suggests that when the reward probability was low, the mice tend to ignore failures and exploit the rare rewards. Computational modeling analysis revealed that the increased learning rates ratio could cause an overestimation of and perseveration on rare-rewarding events, increasing total reward acquisition in the scarce environment but disadvantaging impartial exploration.
doi_str_mv	10.1016/j.neunet.2021.05.030
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_2544458615</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0893608021002264</els_id><sourcerecordid>2544458615</sourcerecordid><originalsourceid>FETCH-LOGICAL-c405t-1484ea0520757f280070b19839868d8449be67f33cb3a7ddd3815437ee2dcc883</originalsourceid><addsrcrecordid>eNp9kEFP4zAUhC3EShR2_wEHH7kk-xzbiXNBQgjYlSpxKRculuu8bF0ldnlOC_33BHXPnEYazYw0H2PXAkoBov69LSPuI05lBZUoQZcg4YwthGnaompMdc4WYFpZ1GDggl3mvAWA2ii5YK-rDXKXj-OIEwXPB3QUQ_zHyU2Yeer5uKcQkePHbkizmejI17hxh5CIh8jzzlFGTvjuqOMYD4FSHDFO-Sf70bsh46__esVeHh9W93-K5fPT3_u7ZeEV6KkQyih0oCtodNNXBqCBtWiNbE1tOqNUu8a66aX0a-maruukEVrJBrHqvDdGXrGb0-6O0tse82THkD0Og4uY9tlWWimlTS30HFWnqKeUM2FvdxRGR0crwH6htFt7Qmm_UFrQdkY5125PNZxvHAKSzT5g9NgFQj_ZLoXvBz4BE1l_zw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2544458615</pqid></control><display><type>article</type><title>The asymmetric learning rates of murine exploratory behavior in sparse reward environments</title><source>Elsevier ScienceDirect Journals</source><creator>Ohta, Hiroyuki ; Satori, Kuniaki ; Takarada, Yu ; Arake, Masashi ; Ishizuka, Toshiaki ; Morimoto, Yuji ; Takahashi, Tatsuji</creator><creatorcontrib>Ohta, Hiroyuki ; Satori, Kuniaki ; Takarada, Yu ; Arake, Masashi ; Ishizuka, Toshiaki ; Morimoto, Yuji ; Takahashi, Tatsuji</creatorcontrib><description>Goal-oriented behaviors of animals can be modeled by reinforcement learning algorithms. Such algorithms predict future outcomes of selected actions utilizing action values and updating those values in response to the positive and negative outcomes. In many models of animal behavior, the action values are updated symmetrically based on a common learning rate, that is, in the same way for both positive and negative outcomes. However, animals in environments with scarce rewards may have uneven learning rates. To investigate the asymmetry in learning rates in reward and non-reward, we analyzed the exploration behavior of mice in five-armed bandit tasks using a Q-learning model with differential learning rates for positive and negative outcomes. The positive learning rate was significantly higher in a scarce reward environment than in a rich reward environment, and conversely, the negative learning rate was significantly lower in the scarce environment. The positive to negative learning rate ratio was about 10 in the scarce environment and about 2 in the rich environment. This result suggests that when the reward probability was low, the mice tend to ignore failures and exploit the rare rewards. Computational modeling analysis revealed that the increased learning rates ratio could cause an overestimation of and perseveration on rare-rewarding events, increasing total reward acquisition in the scarce environment but disadvantaging impartial exploration.</description><identifier>ISSN: 0893-6080</identifier><identifier>EISSN: 1879-2782</identifier><identifier>DOI: 10.1016/j.neunet.2021.05.030</identifier><language>eng</language><publisher>Elsevier Ltd</publisher><subject>Behavior ; Dual learning rate ; Exploration ; Multi-armed bandit problem ; Reinforcement learning</subject><ispartof>Neural networks, 2021-11, Vol.143, p.218-229</ispartof><rights>2021 Elsevier Ltd</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c405t-1484ea0520757f280070b19839868d8449be67f33cb3a7ddd3815437ee2dcc883</citedby><cites>FETCH-LOGICAL-c405t-1484ea0520757f280070b19839868d8449be67f33cb3a7ddd3815437ee2dcc883</cites><orcidid>0000-0002-8891-5664 ; 0000-0001-6494-413X ; 0000-0002-1816-7912 ; 0000-0003-0793-1150</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0893608021002264$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Ohta, Hiroyuki</creatorcontrib><creatorcontrib>Satori, Kuniaki</creatorcontrib><creatorcontrib>Takarada, Yu</creatorcontrib><creatorcontrib>Arake, Masashi</creatorcontrib><creatorcontrib>Ishizuka, Toshiaki</creatorcontrib><creatorcontrib>Morimoto, Yuji</creatorcontrib><creatorcontrib>Takahashi, Tatsuji</creatorcontrib><title>The asymmetric learning rates of murine exploratory behavior in sparse reward environments</title><title>Neural networks</title><description>Goal-oriented behaviors of animals can be modeled by reinforcement learning algorithms. Such algorithms predict future outcomes of selected actions utilizing action values and updating those values in response to the positive and negative outcomes. In many models of animal behavior, the action values are updated symmetrically based on a common learning rate, that is, in the same way for both positive and negative outcomes. However, animals in environments with scarce rewards may have uneven learning rates. To investigate the asymmetry in learning rates in reward and non-reward, we analyzed the exploration behavior of mice in five-armed bandit tasks using a Q-learning model with differential learning rates for positive and negative outcomes. The positive learning rate was significantly higher in a scarce reward environment than in a rich reward environment, and conversely, the negative learning rate was significantly lower in the scarce environment. The positive to negative learning rate ratio was about 10 in the scarce environment and about 2 in the rich environment. This result suggests that when the reward probability was low, the mice tend to ignore failures and exploit the rare rewards. Computational modeling analysis revealed that the increased learning rates ratio could cause an overestimation of and perseveration on rare-rewarding events, increasing total reward acquisition in the scarce environment but disadvantaging impartial exploration.</description><subject>Behavior</subject><subject>Dual learning rate</subject><subject>Exploration</subject><subject>Multi-armed bandit problem</subject><subject>Reinforcement learning</subject><issn>0893-6080</issn><issn>1879-2782</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kEFP4zAUhC3EShR2_wEHH7kk-xzbiXNBQgjYlSpxKRculuu8bF0ldnlOC_33BHXPnEYazYw0H2PXAkoBov69LSPuI05lBZUoQZcg4YwthGnaompMdc4WYFpZ1GDggl3mvAWA2ii5YK-rDXKXj-OIEwXPB3QUQ_zHyU2Yeer5uKcQkePHbkizmejI17hxh5CIh8jzzlFGTvjuqOMYD4FSHDFO-Sf70bsh46__esVeHh9W93-K5fPT3_u7ZeEV6KkQyih0oCtodNNXBqCBtWiNbE1tOqNUu8a66aX0a-maruukEVrJBrHqvDdGXrGb0-6O0tse82THkD0Og4uY9tlWWimlTS30HFWnqKeUM2FvdxRGR0crwH6htFt7Qmm_UFrQdkY5125PNZxvHAKSzT5g9NgFQj_ZLoXvBz4BE1l_zw</recordid><startdate>202111</startdate><enddate>202111</enddate><creator>Ohta, Hiroyuki</creator><creator>Satori, Kuniaki</creator><creator>Takarada, Yu</creator><creator>Arake, Masashi</creator><creator>Ishizuka, Toshiaki</creator><creator>Morimoto, Yuji</creator><creator>Takahashi, Tatsuji</creator><general>Elsevier Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-8891-5664</orcidid><orcidid>https://orcid.org/0000-0001-6494-413X</orcidid><orcidid>https://orcid.org/0000-0002-1816-7912</orcidid><orcidid>https://orcid.org/0000-0003-0793-1150</orcidid></search><sort><creationdate>202111</creationdate><title>The asymmetric learning rates of murine exploratory behavior in sparse reward environments</title><author>Ohta, Hiroyuki ; Satori, Kuniaki ; Takarada, Yu ; Arake, Masashi ; Ishizuka, Toshiaki ; Morimoto, Yuji ; Takahashi, Tatsuji</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c405t-1484ea0520757f280070b19839868d8449be67f33cb3a7ddd3815437ee2dcc883</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Behavior</topic><topic>Dual learning rate</topic><topic>Exploration</topic><topic>Multi-armed bandit problem</topic><topic>Reinforcement learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Ohta, Hiroyuki</creatorcontrib><creatorcontrib>Satori, Kuniaki</creatorcontrib><creatorcontrib>Takarada, Yu</creatorcontrib><creatorcontrib>Arake, Masashi</creatorcontrib><creatorcontrib>Ishizuka, Toshiaki</creatorcontrib><creatorcontrib>Morimoto, Yuji</creatorcontrib><creatorcontrib>Takahashi, Tatsuji</creatorcontrib><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>Neural networks</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Ohta, Hiroyuki</au><au>Satori, Kuniaki</au><au>Takarada, Yu</au><au>Arake, Masashi</au><au>Ishizuka, Toshiaki</au><au>Morimoto, Yuji</au><au>Takahashi, Tatsuji</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>The asymmetric learning rates of murine exploratory behavior in sparse reward environments</atitle><jtitle>Neural networks</jtitle><date>2021-11</date><risdate>2021</risdate><volume>143</volume><spage>218</spage><epage>229</epage><pages>218-229</pages><issn>0893-6080</issn><eissn>1879-2782</eissn><abstract>Goal-oriented behaviors of animals can be modeled by reinforcement learning algorithms. Such algorithms predict future outcomes of selected actions utilizing action values and updating those values in response to the positive and negative outcomes. In many models of animal behavior, the action values are updated symmetrically based on a common learning rate, that is, in the same way for both positive and negative outcomes. However, animals in environments with scarce rewards may have uneven learning rates. To investigate the asymmetry in learning rates in reward and non-reward, we analyzed the exploration behavior of mice in five-armed bandit tasks using a Q-learning model with differential learning rates for positive and negative outcomes. The positive learning rate was significantly higher in a scarce reward environment than in a rich reward environment, and conversely, the negative learning rate was significantly lower in the scarce environment. The positive to negative learning rate ratio was about 10 in the scarce environment and about 2 in the rich environment. This result suggests that when the reward probability was low, the mice tend to ignore failures and exploit the rare rewards. Computational modeling analysis revealed that the increased learning rates ratio could cause an overestimation of and perseveration on rare-rewarding events, increasing total reward acquisition in the scarce environment but disadvantaging impartial exploration.</abstract><pub>Elsevier Ltd</pub><doi>10.1016/j.neunet.2021.05.030</doi><tpages>12</tpages><orcidid>https://orcid.org/0000-0002-8891-5664</orcidid><orcidid>https://orcid.org/0000-0001-6494-413X</orcidid><orcidid>https://orcid.org/0000-0002-1816-7912</orcidid><orcidid>https://orcid.org/0000-0003-0793-1150</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0893-6080
ispartof	Neural networks, 2021-11, Vol.143, p.218-229
issn	0893-6080 1879-2782
language	eng
recordid	cdi_proquest_miscellaneous_2544458615
source	Elsevier ScienceDirect Journals
subjects	Behavior Dual learning rate Exploration Multi-armed bandit problem Reinforcement learning
title	The asymmetric learning rates of murine exploratory behavior in sparse reward environments
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-02T06%3A22%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=The%20asymmetric%20learning%20rates%20of%20murine%20exploratory%20behavior%20in%20sparse%20reward%20environments&rft.jtitle=Neural%20networks&rft.au=Ohta,%20Hiroyuki&rft.date=2021-11&rft.volume=143&rft.spage=218&rft.epage=229&rft.pages=218-229&rft.issn=0893-6080&rft.eissn=1879-2782&rft_id=info:doi/10.1016/j.neunet.2021.05.030&rft_dat=%3Cproquest_cross%3E2544458615%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2544458615&rft_id=info:pmid/&rft_els_id=S0893608021002264&rfr_iscdi=true