Goal-driven active learning

Deep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors prov...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Autonomous agents and multi-agent systems 2021-10, Vol.35 (2), Article 44
Hauptverfasser:	Bougie, Nicolas, Ichise, Ryutaro
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Cloning Computer Science Computer Systems Organization and Communication Networks Decision making Deep learning Software Engineering/Programming and Operating Systems User Interfaces and Human Computer Interaction
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	2
container_start_page
container_title	Autonomous agents and multi-agent systems
container_volume	35
creator	Bougie, Nicolas Ichise, Ryutaro
description	Deep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.
doi_str_mv	10.1007/s10458-021-09527-5
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2561939362</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2561939362</sourcerecordid><originalsourceid>FETCH-LOGICAL-c314t-6d60fe4fd650b465bea3cbb289587eba33d0a018e5491bdf29df10c3f0f552b93</originalsourceid><addsrcrecordid>eNp9kEFLAzEQhYMoWKt_QC8Fz9GZJJPdHKVoFQpe9ByS3aRsqbs12Qr-e6MrePP05vC-N_AxdolwgwDVbUZQVHMQyMGQqDgdsRlSJXmlSB2XW9YVFyTFKTvLeQuAWmicsavV4Ha8Td1H6BeuGUsudsGlvus35-wkul0OF785Z68P9y_LR75-Xj0t79a8kahGrlsNMajYagKvNPngZOO9qA3VVfBOyhYcYB1IGfRtFKaNCI2MEImEN3LOrqfdfRreDyGPdjscUl9eWkEajTRSi9ISU6tJQ84pRLtP3ZtLnxbBfkuwkwRbJNgfCZYKJCcol3K_Celv-h_qC1juXZI</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2561939362</pqid></control><display><type>article</type><title>Goal-driven active learning</title><source>SpringerLink Journals</source><creator>Bougie, Nicolas ; Ichise, Ryutaro</creator><creatorcontrib>Bougie, Nicolas ; Ichise, Ryutaro</creatorcontrib><description>Deep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.</description><identifier>ISSN: 1387-2532</identifier><identifier>EISSN: 1573-7454</identifier><identifier>DOI: 10.1007/s10458-021-09527-5</identifier><language>eng</language><publisher>New York: Springer US</publisher><subject>Artificial Intelligence ; Cloning ; Computer Science ; Computer Systems Organization and Communication Networks ; Decision making ; Deep learning ; Software Engineering/Programming and Operating Systems ; User Interfaces and Human Computer Interaction</subject><ispartof>Autonomous agents and multi-agent systems, 2021-10, Vol.35 (2), Article 44</ispartof><rights>The Author(s) 2021</rights><rights>The Author(s) 2021. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c314t-6d60fe4fd650b465bea3cbb289587eba33d0a018e5491bdf29df10c3f0f552b93</cites><orcidid>0000-0001-9856-0038</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://link.springer.com/content/pdf/10.1007/s10458-021-09527-5$$EPDF$$P50$$Gspringer$$Hfree_for_read</linktopdf><linktohtml>$$Uhttps://link.springer.com/10.1007/s10458-021-09527-5$$EHTML$$P50$$Gspringer$$Hfree_for_read</linktohtml><link.rule.ids>314,776,780,27901,27902,41464,42533,51294</link.rule.ids></links><search><creatorcontrib>Bougie, Nicolas</creatorcontrib><creatorcontrib>Ichise, Ryutaro</creatorcontrib><title>Goal-driven active learning</title><title>Autonomous agents and multi-agent systems</title><addtitle>Auton Agent Multi-Agent Syst</addtitle><description>Deep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.</description><subject>Artificial Intelligence</subject><subject>Cloning</subject><subject>Computer Science</subject><subject>Computer Systems Organization and Communication Networks</subject><subject>Decision making</subject><subject>Deep learning</subject><subject>Software Engineering/Programming and Operating Systems</subject><subject>User Interfaces and Human Computer Interaction</subject><issn>1387-2532</issn><issn>1573-7454</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>C6C</sourceid><recordid>eNp9kEFLAzEQhYMoWKt_QC8Fz9GZJJPdHKVoFQpe9ByS3aRsqbs12Qr-e6MrePP05vC-N_AxdolwgwDVbUZQVHMQyMGQqDgdsRlSJXmlSB2XW9YVFyTFKTvLeQuAWmicsavV4Ha8Td1H6BeuGUsudsGlvus35-wkul0OF785Z68P9y_LR75-Xj0t79a8kahGrlsNMajYagKvNPngZOO9qA3VVfBOyhYcYB1IGfRtFKaNCI2MEImEN3LOrqfdfRreDyGPdjscUl9eWkEajTRSi9ISU6tJQ84pRLtP3ZtLnxbBfkuwkwRbJNgfCZYKJCcol3K_Celv-h_qC1juXZI</recordid><startdate>20211001</startdate><enddate>20211001</enddate><creator>Bougie, Nicolas</creator><creator>Ichise, Ryutaro</creator><general>Springer US</general><general>Springer Nature B.V</general><scope>C6C</scope><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0001-9856-0038</orcidid></search><sort><creationdate>20211001</creationdate><title>Goal-driven active learning</title><author>Bougie, Nicolas ; Ichise, Ryutaro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c314t-6d60fe4fd650b465bea3cbb289587eba33d0a018e5491bdf29df10c3f0f552b93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Artificial Intelligence</topic><topic>Cloning</topic><topic>Computer Science</topic><topic>Computer Systems Organization and Communication Networks</topic><topic>Decision making</topic><topic>Deep learning</topic><topic>Software Engineering/Programming and Operating Systems</topic><topic>User Interfaces and Human Computer Interaction</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Bougie, Nicolas</creatorcontrib><creatorcontrib>Ichise, Ryutaro</creatorcontrib><collection>Springer Nature OA Free Journals</collection><collection>CrossRef</collection><jtitle>Autonomous agents and multi-agent systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Bougie, Nicolas</au><au>Ichise, Ryutaro</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Goal-driven active learning</atitle><jtitle>Autonomous agents and multi-agent systems</jtitle><stitle>Auton Agent Multi-Agent Syst</stitle><date>2021-10-01</date><risdate>2021</risdate><volume>35</volume><issue>2</issue><artnum>44</artnum><issn>1387-2532</issn><eissn>1573-7454</eissn><abstract>Deep reinforcement learning methods have achieved significant successes in complex decision-making problems. In fact, they traditionally rely on well-designed extrinsic rewards, which limits their applicability to many real-world tasks where rewards are naturally sparse. While cloning behaviors provided by an expert is a promising approach to the exploration problem, learning from a fixed set of demonstrations may be impracticable due to lack of state coverage or distribution mismatch—when the learner’s goal deviates from the demonstrated behaviors. Besides, we are interested in learning how to reach a wide range of goals from the same set of demonstrations. In this work we propose a novel goal-conditioned method that leverages very small sets of goal-driven demonstrations to massively accelerate the learning process. Crucially, we introduce the concept of active goal-driven demonstrations to query the demonstrator only in hard-to-learn and uncertain regions of the state space. We further present a strategy for prioritizing sampling of goals where the disagreement between the expert and the policy is maximized. We evaluate our method on a variety of benchmark environments from the Mujoco domain. Experimental results show that our method outperforms prior imitation learning approaches in most of the tasks in terms of exploration efficiency and average scores.</abstract><cop>New York</cop><pub>Springer US</pub><doi>10.1007/s10458-021-09527-5</doi><orcidid>https://orcid.org/0000-0001-9856-0038</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1387-2532
ispartof	Autonomous agents and multi-agent systems, 2021-10, Vol.35 (2), Article 44
issn	1387-2532 1573-7454
language	eng
recordid	cdi_proquest_journals_2561939362
source	SpringerLink Journals
subjects	Artificial Intelligence Cloning Computer Science Computer Systems Organization and Communication Networks Decision making Deep learning Software Engineering/Programming and Operating Systems User Interfaces and Human Computer Interaction
title	Goal-driven active learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-31T05%3A33%3A07IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Goal-driven%20active%20learning&rft.jtitle=Autonomous%20agents%20and%20multi-agent%20systems&rft.au=Bougie,%20Nicolas&rft.date=2021-10-01&rft.volume=35&rft.issue=2&rft.artnum=44&rft.issn=1387-2532&rft.eissn=1573-7454&rft_id=info:doi/10.1007/s10458-021-09527-5&rft_dat=%3Cproquest_cross%3E2561939362%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2561939362&rft_id=info:pmid/&rfr_iscdi=true