Complex problem solving with reinforcement learning

We previously measured human performance on a complex problem-solving task that involves finding which ball in a set is lighter or heavier than the others with a limited number of weightings. None of the participants found a correct solution within 30 minutes without help of demonstrations or instru...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Dandurand, F., Shultz, T.R., Rivest, F.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Biological system modeling Biology computing Cognition Complex Cognition Computational modeling Feedback Humans Information processing Learning Problem Solving Psychology Reinforcement Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	162
container_issue
container_start_page	157
container_title
container_volume
creator	Dandurand, F. Shultz, T.R. Rivest, F.
description	We previously measured human performance on a complex problem-solving task that involves finding which ball in a set is lighter or heavier than the others with a limited number of weightings. None of the participants found a correct solution within 30 minutes without help of demonstrations or instructions. In this paper, we model human performance on this task using a biologically plausible computational model based on reinforcement learning. We use a SARSA-based Softmax learning algorithm where the reward function is learned using cascade-correlation neural networks. First, we find that the task can be learned by reinforcement alone with substantial training. Second, we study the number of alternative actions available to Softmax and find that 5 works well for this problem which is compatible with estimates of human working memory size. Third, we find that simulations are less accurate than humans given equivalent amount of training We suggest that humans use means-ends analysis to self-generate rewards in non-terminal states. Implementing such self-generated rewards might improve model accuracy. Finally, we pretrain models to prefer simple actions, like humans. We partially capture a simplicity bias, and find that it had little impact on accuracy.
doi_str_mv	10.1109/DEVLRN.2007.4354026
format	Conference Proceeding
fullrecord	<record><control><sourceid>ieee_6IE</sourceid><recordid>TN_cdi_ieee_primary_4354026</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>4354026</ieee_id><sourcerecordid>4354026</sourcerecordid><originalsourceid>FETCH-LOGICAL-i90t-d92c181d8793091456d6ef1959c380f65d20812b7a68535b2207276071ebf4933</originalsourceid><addsrcrecordid>eNo1j8lKxEAUAFtFcBzzBXPJDyS-13sfJY6jEBRk8DpkedGWzkInuPy9guOpDgUFxdgGIUcEd327fSmfH3MOYHIplASuT9glSi4lImp1ylYcNWZOGn3GEmfsv1PigiXz_A4AAq21DldMFGM_BfpKpzjWgfp0HsOHH17TT7-8pZH80I2xoZ6GJQ1UxeHXXbHzrgozJUeu2f5uuy_us_Jp91DclJl3sGSt4w1abK1xAhxKpVtNHTrlGmGh06rlYJHXptJWCVVzDoYbDQap7qQTYs02f1lPRIcp-r6K34fjsvgBu0xGuA</addsrcrecordid><sourcetype>Publisher</sourcetype><iscdi>true</iscdi><recordtype>conference_proceeding</recordtype></control><display><type>conference_proceeding</type><title>Complex problem solving with reinforcement learning</title><source>IEEE Electronic Library (IEL) Conference Proceedings</source><creator>Dandurand, F. ; Shultz, T.R. ; Rivest, F.</creator><creatorcontrib>Dandurand, F. ; Shultz, T.R. ; Rivest, F.</creatorcontrib><description>We previously measured human performance on a complex problem-solving task that involves finding which ball in a set is lighter or heavier than the others with a limited number of weightings. None of the participants found a correct solution within 30 minutes without help of demonstrations or instructions. In this paper, we model human performance on this task using a biologically plausible computational model based on reinforcement learning. We use a SARSA-based Softmax learning algorithm where the reward function is learned using cascade-correlation neural networks. First, we find that the task can be learned by reinforcement alone with substantial training. Second, we study the number of alternative actions available to Softmax and find that 5 works well for this problem which is compatible with estimates of human working memory size. Third, we find that simulations are less accurate than humans given equivalent amount of training We suggest that humans use means-ends analysis to self-generate rewards in non-terminal states. Implementing such self-generated rewards might improve model accuracy. Finally, we pretrain models to prefer simple actions, like humans. We partially capture a simplicity bias, and find that it had little impact on accuracy.</description><identifier>ISBN: 9781424411153</identifier><identifier>ISBN: 1424411157</identifier><identifier>EISSN: 2161-9476</identifier><identifier>EISBN: 1424411165</identifier><identifier>EISBN: 9781424411160</identifier><identifier>DOI: 10.1109/DEVLRN.2007.4354026</identifier><language>eng</language><publisher>IEEE</publisher><subject>Biological system modeling ; Biology computing ; Cognition ; Complex Cognition ; Computational modeling ; Feedback ; Humans ; Information processing ; Learning ; Problem Solving ; Psychology ; Reinforcement Learning</subject><ispartof>2007 IEEE 6th International Conference on Development and Learning, 2007, p.157-162</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/4354026$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>309,310,776,780,785,786,2052,27902,54895</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/4354026$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Dandurand, F.</creatorcontrib><creatorcontrib>Shultz, T.R.</creatorcontrib><creatorcontrib>Rivest, F.</creatorcontrib><title>Complex problem solving with reinforcement learning</title><title>2007 IEEE 6th International Conference on Development and Learning</title><addtitle>DEVLRN</addtitle><description>We previously measured human performance on a complex problem-solving task that involves finding which ball in a set is lighter or heavier than the others with a limited number of weightings. None of the participants found a correct solution within 30 minutes without help of demonstrations or instructions. In this paper, we model human performance on this task using a biologically plausible computational model based on reinforcement learning. We use a SARSA-based Softmax learning algorithm where the reward function is learned using cascade-correlation neural networks. First, we find that the task can be learned by reinforcement alone with substantial training. Second, we study the number of alternative actions available to Softmax and find that 5 works well for this problem which is compatible with estimates of human working memory size. Third, we find that simulations are less accurate than humans given equivalent amount of training We suggest that humans use means-ends analysis to self-generate rewards in non-terminal states. Implementing such self-generated rewards might improve model accuracy. Finally, we pretrain models to prefer simple actions, like humans. We partially capture a simplicity bias, and find that it had little impact on accuracy.</description><subject>Biological system modeling</subject><subject>Biology computing</subject><subject>Cognition</subject><subject>Complex Cognition</subject><subject>Computational modeling</subject><subject>Feedback</subject><subject>Humans</subject><subject>Information processing</subject><subject>Learning</subject><subject>Problem Solving</subject><subject>Psychology</subject><subject>Reinforcement Learning</subject><issn>2161-9476</issn><isbn>9781424411153</isbn><isbn>1424411157</isbn><isbn>1424411165</isbn><isbn>9781424411160</isbn><fulltext>true</fulltext><rsrctype>conference_proceeding</rsrctype><creationdate>2007</creationdate><recordtype>conference_proceeding</recordtype><sourceid>6IE</sourceid><sourceid>RIE</sourceid><recordid>eNo1j8lKxEAUAFtFcBzzBXPJDyS-13sfJY6jEBRk8DpkedGWzkInuPy9guOpDgUFxdgGIUcEd327fSmfH3MOYHIplASuT9glSi4lImp1ylYcNWZOGn3GEmfsv1PigiXz_A4AAq21DldMFGM_BfpKpzjWgfp0HsOHH17TT7-8pZH80I2xoZ6GJQ1UxeHXXbHzrgozJUeu2f5uuy_us_Jp91DclJl3sGSt4w1abK1xAhxKpVtNHTrlGmGh06rlYJHXptJWCVVzDoYbDQap7qQTYs02f1lPRIcp-r6K34fjsvgBu0xGuA</recordid><startdate>200707</startdate><enddate>200707</enddate><creator>Dandurand, F.</creator><creator>Shultz, T.R.</creator><creator>Rivest, F.</creator><general>IEEE</general><scope>6IE</scope><scope>6IL</scope><scope>CBEJK</scope><scope>RIE</scope><scope>RIL</scope></search><sort><creationdate>200707</creationdate><title>Complex problem solving with reinforcement learning</title><author>Dandurand, F. ; Shultz, T.R. ; Rivest, F.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-i90t-d92c181d8793091456d6ef1959c380f65d20812b7a68535b2207276071ebf4933</frbrgroupid><rsrctype>conference_proceedings</rsrctype><prefilter>conference_proceedings</prefilter><language>eng</language><creationdate>2007</creationdate><topic>Biological system modeling</topic><topic>Biology computing</topic><topic>Cognition</topic><topic>Complex Cognition</topic><topic>Computational modeling</topic><topic>Feedback</topic><topic>Humans</topic><topic>Information processing</topic><topic>Learning</topic><topic>Problem Solving</topic><topic>Psychology</topic><topic>Reinforcement Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Dandurand, F.</creatorcontrib><creatorcontrib>Shultz, T.R.</creatorcontrib><creatorcontrib>Rivest, F.</creatorcontrib><collection>IEEE Electronic Library (IEL) Conference Proceedings</collection><collection>IEEE Proceedings Order Plan All Online (POP All Online) 1998-present by volume</collection><collection>IEEE Xplore All Conference Proceedings</collection><collection>IEEE Electronic Library (IEL)</collection><collection>IEEE Proceedings Order Plans (POP All) 1998-Present</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Dandurand, F.</au><au>Shultz, T.R.</au><au>Rivest, F.</au><format>book</format><genre>proceeding</genre><ristype>CONF</ristype><atitle>Complex problem solving with reinforcement learning</atitle><btitle>2007 IEEE 6th International Conference on Development and Learning</btitle><stitle>DEVLRN</stitle><date>2007-07</date><risdate>2007</risdate><spage>157</spage><epage>162</epage><pages>157-162</pages><eissn>2161-9476</eissn><isbn>9781424411153</isbn><isbn>1424411157</isbn><eisbn>1424411165</eisbn><eisbn>9781424411160</eisbn><abstract>We previously measured human performance on a complex problem-solving task that involves finding which ball in a set is lighter or heavier than the others with a limited number of weightings. None of the participants found a correct solution within 30 minutes without help of demonstrations or instructions. In this paper, we model human performance on this task using a biologically plausible computational model based on reinforcement learning. We use a SARSA-based Softmax learning algorithm where the reward function is learned using cascade-correlation neural networks. First, we find that the task can be learned by reinforcement alone with substantial training. Second, we study the number of alternative actions available to Softmax and find that 5 works well for this problem which is compatible with estimates of human working memory size. Third, we find that simulations are less accurate than humans given equivalent amount of training We suggest that humans use means-ends analysis to self-generate rewards in non-terminal states. Implementing such self-generated rewards might improve model accuracy. Finally, we pretrain models to prefer simple actions, like humans. We partially capture a simplicity bias, and find that it had little impact on accuracy.</abstract><pub>IEEE</pub><doi>10.1109/DEVLRN.2007.4354026</doi><tpages>6</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISBN: 9781424411153
ispartof	2007 IEEE 6th International Conference on Development and Learning, 2007, p.157-162
issn	2161-9476
language	eng
recordid	cdi_ieee_primary_4354026
source	IEEE Electronic Library (IEL) Conference Proceedings
subjects	Biological system modeling Biology computing Cognition Complex Cognition Computational modeling Feedback Humans Information processing Learning Problem Solving Psychology Reinforcement Learning
title	Complex problem solving with reinforcement learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-04T21%3A47%3A29IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-ieee_6IE&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=proceeding&rft.atitle=Complex%20problem%20solving%20with%20reinforcement%20learning&rft.btitle=2007%20IEEE%206th%20International%20Conference%20on%20Development%20and%20Learning&rft.au=Dandurand,%20F.&rft.date=2007-07&rft.spage=157&rft.epage=162&rft.pages=157-162&rft.eissn=2161-9476&rft.isbn=9781424411153&rft.isbn_list=1424411157&rft_id=info:doi/10.1109/DEVLRN.2007.4354026&rft_dat=%3Cieee_6IE%3E4354026%3C/ieee_6IE%3E%3Curl%3E%3C/url%3E&rft.eisbn=1424411165&rft.eisbn_list=9781424411160&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=4354026&rfr_iscdi=true