Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations

Deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. These results are commonly achieved at the expense of huge computational costs and require an incredible number of episodes of interactions between the agent and the environment. Hierarchical method...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2021-04, Vol.218, p.106844, Article 106844
Hauptverfasser:	Skrynnik, Alexey, Staroverov, Aleksey, Aitygulov, Ermek, Aksenov, Kirill, Davydov, Vasilii, Panov, Aleksandr I.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Deep learning Expert demonstrations ForgER Goal-oriented reinforcement learning Hierarchical reinforcement learning Learning Learning from demonstrations Task complexity Task-oriented augmentation Teaching methods
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page	106844
container_title	Knowledge-based systems
container_volume	218
creator	Skrynnik, Alexey Staroverov, Aleksey Aitygulov, Ermek Aksenov, Kirill Davydov, Vasilii Panov, Aleksandr I.
description	Deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. These results are commonly achieved at the expense of huge computational costs and require an incredible number of episodes of interactions between the agent and the environment. Hierarchical methods and expert demonstrations are among the most promising approaches to improve the sample efficiency of reinforcement learning methods. In this paper, we propose a combination of methods that allow the agent to use low-quality demonstrations in complex vision-based environments with multiple related goals. Our Forgetful Experience Replay (ForgER) algorithm effectively handles expert data errors and reduces quality losses when adapting the action space and states representation to the agent’s capabilities. The proposed goal-oriented replay buffer structure allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. Our method has a high degree of versatility and can be integrated into various off-policy methods. The ForgER surpasses the existing state-of-the-art RL methods using expert demonstrations in complex environments. The solution based on our algorithm beats other solutions for the famous MineRL competition and allows the agent to demonstrate the behavior at the expert level.
doi_str_mv	10.1016/j.knosys.2021.106844
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_2509633910</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0950705121001076</els_id><sourcerecordid>2509633910</sourcerecordid><originalsourceid>FETCH-LOGICAL-c334t-934fd50c289b9591ae364225502fca9d5ef049755a1e3d3159f29a0c157a42ab3</originalsourceid><addsrcrecordid>eNp9kE9LAzEUxIMoWKvfwMOC560v_3abiyDFqlDwokcJafalTd0mNdmK_fZuXc-eBoaZebwfIdcUJhRodbuZfISYD3nCgNHeqqZCnJARndasrAWoUzICJaGsQdJzcpHzBgAYo9MReZ_HtMLO7dsCv3eYPAaLRcJdaw6FD8XaYzLJrr01bW_74GKyuMXQFS2aFHxYFS7F7dDuiga3MeQumc73eknOnGkzXv3pmLzNH15nT-Xi5fF5dr8oLeeiKxUXrpFg2VQtlVTUIK8EY1ICc9aoRqIDoWopDUXecCqVY8qApbI2gpklH5ObYXeX4ucec6c3cZ9Cf1IzCariXFHoU2JI2RRzTuj0LvmtSQdNQR9B6o0eQOojSD2A7Gt3Qw37D756HjrbX0yNT2g73UT__8AP94V_jw</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2509633910</pqid></control><display><type>article</type><title>Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations</title><source>Elsevier ScienceDirect Journals</source><creator>Skrynnik, Alexey ; Staroverov, Aleksey ; Aitygulov, Ermek ; Aksenov, Kirill ; Davydov, Vasilii ; Panov, Aleksandr I.</creator><creatorcontrib>Skrynnik, Alexey ; Staroverov, Aleksey ; Aitygulov, Ermek ; Aksenov, Kirill ; Davydov, Vasilii ; Panov, Aleksandr I.</creatorcontrib><description>Deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. These results are commonly achieved at the expense of huge computational costs and require an incredible number of episodes of interactions between the agent and the environment. Hierarchical methods and expert demonstrations are among the most promising approaches to improve the sample efficiency of reinforcement learning methods. In this paper, we propose a combination of methods that allow the agent to use low-quality demonstrations in complex vision-based environments with multiple related goals. Our Forgetful Experience Replay (ForgER) algorithm effectively handles expert data errors and reduces quality losses when adapting the action space and states representation to the agent’s capabilities. The proposed goal-oriented replay buffer structure allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. Our method has a high degree of versatility and can be integrated into various off-policy methods. The ForgER surpasses the existing state-of-the-art RL methods using expert demonstrations in complex environments. The solution based on our algorithm beats other solutions for the famous MineRL competition and allows the agent to demonstrate the behavior at the expert level.</description><identifier>ISSN: 0950-7051</identifier><identifier>EISSN: 1872-7409</identifier><identifier>DOI: 10.1016/j.knosys.2021.106844</identifier><language>eng</language><publisher>Amsterdam: Elsevier B.V</publisher><subject>Algorithms ; Deep learning ; Expert demonstrations ; ForgER ; Goal-oriented reinforcement learning ; Hierarchical reinforcement learning ; Learning ; Learning from demonstrations ; Task complexity ; Task-oriented augmentation ; Teaching methods</subject><ispartof>Knowledge-based systems, 2021-04, Vol.218, p.106844, Article 106844</ispartof><rights>2021 Elsevier B.V.</rights><rights>Copyright Elsevier Science Ltd. Apr 22, 2021</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c334t-934fd50c289b9591ae364225502fca9d5ef049755a1e3d3159f29a0c157a42ab3</citedby><cites>FETCH-LOGICAL-c334t-934fd50c289b9591ae364225502fca9d5ef049755a1e3d3159f29a0c157a42ab3</cites><orcidid>0000-0002-9747-3837 ; 0000-0001-9243-1622</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://www.sciencedirect.com/science/article/pii/S0950705121001076$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,776,780,3537,27901,27902,65306</link.rule.ids></links><search><creatorcontrib>Skrynnik, Alexey</creatorcontrib><creatorcontrib>Staroverov, Aleksey</creatorcontrib><creatorcontrib>Aitygulov, Ermek</creatorcontrib><creatorcontrib>Aksenov, Kirill</creatorcontrib><creatorcontrib>Davydov, Vasilii</creatorcontrib><creatorcontrib>Panov, Aleksandr I.</creatorcontrib><title>Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations</title><title>Knowledge-based systems</title><description>Deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. These results are commonly achieved at the expense of huge computational costs and require an incredible number of episodes of interactions between the agent and the environment. Hierarchical methods and expert demonstrations are among the most promising approaches to improve the sample efficiency of reinforcement learning methods. In this paper, we propose a combination of methods that allow the agent to use low-quality demonstrations in complex vision-based environments with multiple related goals. Our Forgetful Experience Replay (ForgER) algorithm effectively handles expert data errors and reduces quality losses when adapting the action space and states representation to the agent’s capabilities. The proposed goal-oriented replay buffer structure allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. Our method has a high degree of versatility and can be integrated into various off-policy methods. The ForgER surpasses the existing state-of-the-art RL methods using expert demonstrations in complex environments. The solution based on our algorithm beats other solutions for the famous MineRL competition and allows the agent to demonstrate the behavior at the expert level.</description><subject>Algorithms</subject><subject>Deep learning</subject><subject>Expert demonstrations</subject><subject>ForgER</subject><subject>Goal-oriented reinforcement learning</subject><subject>Hierarchical reinforcement learning</subject><subject>Learning</subject><subject>Learning from demonstrations</subject><subject>Task complexity</subject><subject>Task-oriented augmentation</subject><subject>Teaching methods</subject><issn>0950-7051</issn><issn>1872-7409</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><recordid>eNp9kE9LAzEUxIMoWKvfwMOC560v_3abiyDFqlDwokcJafalTd0mNdmK_fZuXc-eBoaZebwfIdcUJhRodbuZfISYD3nCgNHeqqZCnJARndasrAWoUzICJaGsQdJzcpHzBgAYo9MReZ_HtMLO7dsCv3eYPAaLRcJdaw6FD8XaYzLJrr01bW_74GKyuMXQFS2aFHxYFS7F7dDuiga3MeQumc73eknOnGkzXv3pmLzNH15nT-Xi5fF5dr8oLeeiKxUXrpFg2VQtlVTUIK8EY1ICc9aoRqIDoWopDUXecCqVY8qApbI2gpklH5ObYXeX4ucec6c3cZ9Cf1IzCariXFHoU2JI2RRzTuj0LvmtSQdNQR9B6o0eQOojSD2A7Gt3Qw37D756HjrbX0yNT2g73UT__8AP94V_jw</recordid><startdate>20210422</startdate><enddate>20210422</enddate><creator>Skrynnik, Alexey</creator><creator>Staroverov, Aleksey</creator><creator>Aitygulov, Ermek</creator><creator>Aksenov, Kirill</creator><creator>Davydov, Vasilii</creator><creator>Panov, Aleksandr I.</creator><general>Elsevier B.V</general><general>Elsevier Science Ltd</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>E3H</scope><scope>F2A</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-9747-3837</orcidid><orcidid>https://orcid.org/0000-0001-9243-1622</orcidid></search><sort><creationdate>20210422</creationdate><title>Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations</title><author>Skrynnik, Alexey ; Staroverov, Aleksey ; Aitygulov, Ermek ; Aksenov, Kirill ; Davydov, Vasilii ; Panov, Aleksandr I.</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c334t-934fd50c289b9591ae364225502fca9d5ef049755a1e3d3159f29a0c157a42ab3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Deep learning</topic><topic>Expert demonstrations</topic><topic>ForgER</topic><topic>Goal-oriented reinforcement learning</topic><topic>Hierarchical reinforcement learning</topic><topic>Learning</topic><topic>Learning from demonstrations</topic><topic>Task complexity</topic><topic>Task-oriented augmentation</topic><topic>Teaching methods</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Skrynnik, Alexey</creatorcontrib><creatorcontrib>Staroverov, Aleksey</creatorcontrib><creatorcontrib>Aitygulov, Ermek</creatorcontrib><creatorcontrib>Aksenov, Kirill</creatorcontrib><creatorcontrib>Davydov, Vasilii</creatorcontrib><creatorcontrib>Panov, Aleksandr I.</creatorcontrib><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>Library & Information Sciences Abstracts (LISA)</collection><collection>Library & Information Science Abstracts (LISA)</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Knowledge-based systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Skrynnik, Alexey</au><au>Staroverov, Aleksey</au><au>Aitygulov, Ermek</au><au>Aksenov, Kirill</au><au>Davydov, Vasilii</au><au>Panov, Aleksandr I.</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations</atitle><jtitle>Knowledge-based systems</jtitle><date>2021-04-22</date><risdate>2021</risdate><volume>218</volume><spage>106844</spage><pages>106844-</pages><artnum>106844</artnum><issn>0950-7051</issn><eissn>1872-7409</eissn><abstract>Deep reinforcement learning (RL) shows impressive results in complex gaming and robotic environments. These results are commonly achieved at the expense of huge computational costs and require an incredible number of episodes of interactions between the agent and the environment. Hierarchical methods and expert demonstrations are among the most promising approaches to improve the sample efficiency of reinforcement learning methods. In this paper, we propose a combination of methods that allow the agent to use low-quality demonstrations in complex vision-based environments with multiple related goals. Our Forgetful Experience Replay (ForgER) algorithm effectively handles expert data errors and reduces quality losses when adapting the action space and states representation to the agent’s capabilities. The proposed goal-oriented replay buffer structure allows the agent to automatically highlight sub-goals for solving complex hierarchical tasks in demonstrations. Our method has a high degree of versatility and can be integrated into various off-policy methods. The ForgER surpasses the existing state-of-the-art RL methods using expert demonstrations in complex environments. The solution based on our algorithm beats other solutions for the famous MineRL competition and allows the agent to demonstrate the behavior at the expert level.</abstract><cop>Amsterdam</cop><pub>Elsevier B.V</pub><doi>10.1016/j.knosys.2021.106844</doi><orcidid>https://orcid.org/0000-0002-9747-3837</orcidid><orcidid>https://orcid.org/0000-0001-9243-1622</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 0950-7051
ispartof	Knowledge-based systems, 2021-04, Vol.218, p.106844, Article 106844
issn	0950-7051 1872-7409
language	eng
recordid	cdi_proquest_journals_2509633910
source	Elsevier ScienceDirect Journals
subjects	Algorithms Deep learning Expert demonstrations ForgER Goal-oriented reinforcement learning Hierarchical reinforcement learning Learning Learning from demonstrations Task complexity Task-oriented augmentation Teaching methods
title	Forgetful experience replay in hierarchical reinforcement learning from expert demonstrations
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T17%3A23%3A19IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Forgetful%20experience%20replay%20in%20hierarchical%20reinforcement%20learning%20from%20expert%20demonstrations&rft.jtitle=Knowledge-based%20systems&rft.au=Skrynnik,%20Alexey&rft.date=2021-04-22&rft.volume=218&rft.spage=106844&rft.pages=106844-&rft.artnum=106844&rft.issn=0950-7051&rft.eissn=1872-7409&rft_id=info:doi/10.1016/j.knosys.2021.106844&rft_dat=%3Cproquest_cross%3E2509633910%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2509633910&rft_id=info:pmid/&rft_els_id=S0950705121001076&rfr_iscdi=true