Human Feedback as Action Assignment in Interactive Reinforcement Learning

Teaching by demonstrations and teaching by assigning rewards are two popular methods of knowledge transfer in humans. However, showing the right behaviour (by demonstration) may appear more natural to a human teacher than assessing the learner's performance and assigning a reward or punishment...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	ACM transactions on autonomous and adaptive systems 2020-09, Vol.14 (4), p.1-24, Article 14
Hauptverfasser:	Raza, Syed Ali, Williams, Mary-Anne
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science Computer Science, Artificial Intelligence Computer Science, Information Systems Computer Science, Theory & Methods Science & Technology Technology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	24
container_issue	4
container_start_page	1
container_title	ACM transactions on autonomous and adaptive systems
container_volume	14
creator	Raza, Syed Ali Williams, Mary-Anne
description	Teaching by demonstrations and teaching by assigning rewards are two popular methods of knowledge transfer in humans. However, showing the right behaviour (by demonstration) may appear more natural to a human teacher than assessing the learner's performance and assigning a reward or punishment to it. In the context of robot learning, the preference between these two approaches has not been studied extensively. In this article, we propose a method that replaces the traditional method of reward assignment with action assignment (which is similar to providing a demonstration) in interactive reinforcement learning. The main purpose of the suggested action is to compute a reward by seeing if the suggested action was followed by the self-acting agent or not. We compared action assignment with reward assignment via a user study conducted over the web using a two-dimensional maze game. The logs of interactions showed that action assignment significantly improved users' ability to teach the right behaviour. The survey results showed that both action and reward assignment seemed highly natural and usable, reward assignment required more mental effort, repeatedly assigning rewards and seeing the agent disobey commands caused frustration in users, and many users desired to control the agent's behaviour directly.
doi_str_mv	10.1145/3404197
format	Article
fullrecord	<record><control><sourceid>webofscience_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3404197</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>000575714500002</sourcerecordid><originalsourceid>FETCH-LOGICAL-c278t-637c0dd562eb87634c4d333c3e725beb3f146483aa2992642a3c02e7271c2acb3</originalsourceid><addsrcrecordid>eNqNkE1LAzEURYMoWKv4F7JzIaNJXj6myzJYWxgQRNdDJvOmRG1Gkqnivze11bWrd-Ge-xaHkEvObjiX6hYkk3xmjsiEK6ULaRgc_2at1Sk5S-mFMcUZ8AlZLbcbG-gCsWute6U20bkb_RDoPCW_DhsMI_WBrsKI0ebmA-kj-tAP0eFPWaONwYf1OTnp7VvCi8OdkufF3VO1LOqH-1U1rwsnTDkWGoxjXae0wLY0GqSTHQA4QCNUiy30XGpZgrViNhNaCguOiVwa7oR1LUzJ1f6vi0NKEfvmPfqNjV8NZ83OQHMwkMnrPfmJ7dAn5zE4_KNZdmCUyYOcmMh0-X-68qPdWaqGbRjhG92cbT0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Human Feedback as Action Assignment in Interactive Reinforcement Learning</title><source>Access via ACM Digital Library</source><source>Web of Science - Science Citation Index Expanded - 2020<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" /></source><creator>Raza, Syed Ali ; Williams, Mary-Anne</creator><creatorcontrib>Raza, Syed Ali ; Williams, Mary-Anne</creatorcontrib><description>Teaching by demonstrations and teaching by assigning rewards are two popular methods of knowledge transfer in humans. However, showing the right behaviour (by demonstration) may appear more natural to a human teacher than assessing the learner's performance and assigning a reward or punishment to it. In the context of robot learning, the preference between these two approaches has not been studied extensively. In this article, we propose a method that replaces the traditional method of reward assignment with action assignment (which is similar to providing a demonstration) in interactive reinforcement learning. The main purpose of the suggested action is to compute a reward by seeing if the suggested action was followed by the self-acting agent or not. We compared action assignment with reward assignment via a user study conducted over the web using a two-dimensional maze game. The logs of interactions showed that action assignment significantly improved users' ability to teach the right behaviour. The survey results showed that both action and reward assignment seemed highly natural and usable, reward assignment required more mental effort, repeatedly assigning rewards and seeing the agent disobey commands caused frustration in users, and many users desired to control the agent's behaviour directly.</description><identifier>ISSN: 1556-4665</identifier><identifier>EISSN: 1556-4703</identifier><identifier>DOI: 10.1145/3404197</identifier><language>eng</language><publisher>NEW YORK: Assoc Computing Machinery</publisher><subject>Computer Science ; Computer Science, Artificial Intelligence ; Computer Science, Information Systems ; Computer Science, Theory & Methods ; Science & Technology ; Technology</subject><ispartof>ACM transactions on autonomous and adaptive systems, 2020-09, Vol.14 (4), p.1-24, Article 14</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>true</woscitedreferencessubscribed><woscitedreferencescount>6</woscitedreferencescount><woscitedreferencesoriginalsourcerecordid>wos000575714500002</woscitedreferencesoriginalsourcerecordid><citedby>FETCH-LOGICAL-c278t-637c0dd562eb87634c4d333c3e725beb3f146483aa2992642a3c02e7271c2acb3</citedby><cites>FETCH-LOGICAL-c278t-637c0dd562eb87634c4d333c3e725beb3f146483aa2992642a3c02e7271c2acb3</cites><orcidid>0000-0002-1047-0503</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>315,781,785,27929,27930,28253</link.rule.ids></links><search><creatorcontrib>Raza, Syed Ali</creatorcontrib><creatorcontrib>Williams, Mary-Anne</creatorcontrib><title>Human Feedback as Action Assignment in Interactive Reinforcement Learning</title><title>ACM transactions on autonomous and adaptive systems</title><addtitle>ACM T AUTON ADAP SYS</addtitle><description>Teaching by demonstrations and teaching by assigning rewards are two popular methods of knowledge transfer in humans. However, showing the right behaviour (by demonstration) may appear more natural to a human teacher than assessing the learner's performance and assigning a reward or punishment to it. In the context of robot learning, the preference between these two approaches has not been studied extensively. In this article, we propose a method that replaces the traditional method of reward assignment with action assignment (which is similar to providing a demonstration) in interactive reinforcement learning. The main purpose of the suggested action is to compute a reward by seeing if the suggested action was followed by the self-acting agent or not. We compared action assignment with reward assignment via a user study conducted over the web using a two-dimensional maze game. The logs of interactions showed that action assignment significantly improved users' ability to teach the right behaviour. The survey results showed that both action and reward assignment seemed highly natural and usable, reward assignment required more mental effort, repeatedly assigning rewards and seeing the agent disobey commands caused frustration in users, and many users desired to control the agent's behaviour directly.</description><subject>Computer Science</subject><subject>Computer Science, Artificial Intelligence</subject><subject>Computer Science, Information Systems</subject><subject>Computer Science, Theory & Methods</subject><subject>Science & Technology</subject><subject>Technology</subject><issn>1556-4665</issn><issn>1556-4703</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>AOWDO</sourceid><recordid>eNqNkE1LAzEURYMoWKv4F7JzIaNJXj6myzJYWxgQRNdDJvOmRG1Gkqnivze11bWrd-Ge-xaHkEvObjiX6hYkk3xmjsiEK6ULaRgc_2at1Sk5S-mFMcUZ8AlZLbcbG-gCsWute6U20bkb_RDoPCW_DhsMI_WBrsKI0ebmA-kj-tAP0eFPWaONwYf1OTnp7VvCi8OdkufF3VO1LOqH-1U1rwsnTDkWGoxjXae0wLY0GqSTHQA4QCNUiy30XGpZgrViNhNaCguOiVwa7oR1LUzJ1f6vi0NKEfvmPfqNjV8NZ83OQHMwkMnrPfmJ7dAn5zE4_KNZdmCUyYOcmMh0-X-68qPdWaqGbRjhG92cbT0</recordid><startdate>20200901</startdate><enddate>20200901</enddate><creator>Raza, Syed Ali</creator><creator>Williams, Mary-Anne</creator><general>Assoc Computing Machinery</general><scope>AOWDO</scope><scope>BLEPL</scope><scope>DTL</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-1047-0503</orcidid></search><sort><creationdate>20200901</creationdate><title>Human Feedback as Action Assignment in Interactive Reinforcement Learning</title><author>Raza, Syed Ali ; Williams, Mary-Anne</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c278t-637c0dd562eb87634c4d333c3e725beb3f146483aa2992642a3c02e7271c2acb3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science</topic><topic>Computer Science, Artificial Intelligence</topic><topic>Computer Science, Information Systems</topic><topic>Computer Science, Theory & Methods</topic><topic>Science & Technology</topic><topic>Technology</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Raza, Syed Ali</creatorcontrib><creatorcontrib>Williams, Mary-Anne</creatorcontrib><collection>Web of Science - Science Citation Index Expanded - 2020</collection><collection>Web of Science Core Collection</collection><collection>Science Citation Index Expanded</collection><collection>CrossRef</collection><jtitle>ACM transactions on autonomous and adaptive systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Raza, Syed Ali</au><au>Williams, Mary-Anne</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Human Feedback as Action Assignment in Interactive Reinforcement Learning</atitle><jtitle>ACM transactions on autonomous and adaptive systems</jtitle><stitle>ACM T AUTON ADAP SYS</stitle><date>2020-09-01</date><risdate>2020</risdate><volume>14</volume><issue>4</issue><spage>1</spage><epage>24</epage><pages>1-24</pages><artnum>14</artnum><issn>1556-4665</issn><eissn>1556-4703</eissn><abstract>Teaching by demonstrations and teaching by assigning rewards are two popular methods of knowledge transfer in humans. However, showing the right behaviour (by demonstration) may appear more natural to a human teacher than assessing the learner's performance and assigning a reward or punishment to it. In the context of robot learning, the preference between these two approaches has not been studied extensively. In this article, we propose a method that replaces the traditional method of reward assignment with action assignment (which is similar to providing a demonstration) in interactive reinforcement learning. The main purpose of the suggested action is to compute a reward by seeing if the suggested action was followed by the self-acting agent or not. We compared action assignment with reward assignment via a user study conducted over the web using a two-dimensional maze game. The logs of interactions showed that action assignment significantly improved users' ability to teach the right behaviour. The survey results showed that both action and reward assignment seemed highly natural and usable, reward assignment required more mental effort, repeatedly assigning rewards and seeing the agent disobey commands caused frustration in users, and many users desired to control the agent's behaviour directly.</abstract><cop>NEW YORK</cop><pub>Assoc Computing Machinery</pub><doi>10.1145/3404197</doi><tpages>24</tpages><orcidid>https://orcid.org/0000-0002-1047-0503</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 1556-4665
ispartof	ACM transactions on autonomous and adaptive systems, 2020-09, Vol.14 (4), p.1-24, Article 14
issn	1556-4665 1556-4703
language	eng
recordid	cdi_crossref_primary_10_1145_3404197
source	Access via ACM Digital Library; Web of Science - Science Citation Index Expanded - 2020<img src="https://exlibris-pub.s3.amazonaws.com/fromwos-v2.jpg" />
subjects	Computer Science Computer Science, Artificial Intelligence Computer Science, Information Systems Computer Science, Theory & Methods Science & Technology Technology
title	Human Feedback as Action Assignment in Interactive Reinforcement Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-12T14%3A34%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-webofscience_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Human%20Feedback%20as%20Action%20Assignment%20in%20Interactive%20Reinforcement%20Learning&rft.jtitle=ACM%20transactions%20on%20autonomous%20and%20adaptive%20systems&rft.au=Raza,%20Syed%20Ali&rft.date=2020-09-01&rft.volume=14&rft.issue=4&rft.spage=1&rft.epage=24&rft.pages=1-24&rft.artnum=14&rft.issn=1556-4665&rft.eissn=1556-4703&rft_id=info:doi/10.1145/3404197&rft_dat=%3Cwebofscience_cross%3E000575714500002%3C/webofscience_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true