Hierarchical Reinforcement Learning Based on Planning Operators

Long-horizon manipulation tasks such as stacking represent a longstanding challenge in the field of robotic manipulation, particularly when using reinforcement learning (RL) methods which often struggle to learn the correct sequence of actions for achieving these complex goals. To learn this sequenc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Zhang, Jing, Dean, Emmanuel, Ramirez-Amaro, Karinne
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Robotics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Zhang, Jing Dean, Emmanuel Ramirez-Amaro, Karinne
description	Long-horizon manipulation tasks such as stacking represent a longstanding challenge in the field of robotic manipulation, particularly when using reinforcement learning (RL) methods which often struggle to learn the correct sequence of actions for achieving these complex goals. To learn this sequence, symbolic planning methods offer a good solution based on high-level reasoning, however, planners often fall short in addressing the low-level control specificity needed for precise execution. This paper introduces a novel framework that integrates symbolic planning with hierarchical RL through the cooperation of high-level operators and low-level policies. Our contribution integrates planning operators (e.g. preconditions and effects) as part of the hierarchical RL algorithm based on the Scheduled Auxiliary Control (SAC-X) method. We developed a dual-purpose high-level operator, which can be used both in holistic planning and as independent, reusable policies. Our approach offers a flexible solution for long-horizon tasks, e.g., stacking a cube. The experimental results show that our proposed method obtained an average of 97.2% success rate for learning and executing the whole stack sequence, and the success rate for learning independent policies, e.g. reach (98.9%), lift (99.7%), stack (85%), etc. The training time is also reduced by 68% when using our proposed approach.
doi_str_mv	10.48550/arxiv.2309.14237
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2309_14237</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2309_14237</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-b61e78317c89e1a47aa2c8f9fa927586ee2429de94663302040a2c2e9fbc24063</originalsourceid><addsrcrecordid>eNotj81KAzEUhbPpQqoP4Kp5gRmTm0x-VlKLWmGgpXQ_3KY3GphmSqaIvr11dHXgcL4DH2P3UtTaNY14wPKVPmtQwtdSg7I37HGdqGAJHylgz3eUchxKoBPlC28JS075nT_hSEc-ZL7tMU_N5nylLkMZb9ksYj_S3X_O2f7leb9aV-3m9W21bCs01lYHI8k6JW1wniRqiwjBRR_Rg22cIQIN_kheG6OUAKHFdQDk4yGAFkbN2eLvdjLoziWdsHx3vybdZKJ-ALjVQvU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Hierarchical Reinforcement Learning Based on Planning Operators</title><source>arXiv.org</source><creator>Zhang, Jing ; Dean, Emmanuel ; Ramirez-Amaro, Karinne</creator><creatorcontrib>Zhang, Jing ; Dean, Emmanuel ; Ramirez-Amaro, Karinne</creatorcontrib><description>Long-horizon manipulation tasks such as stacking represent a longstanding challenge in the field of robotic manipulation, particularly when using reinforcement learning (RL) methods which often struggle to learn the correct sequence of actions for achieving these complex goals. To learn this sequence, symbolic planning methods offer a good solution based on high-level reasoning, however, planners often fall short in addressing the low-level control specificity needed for precise execution. This paper introduces a novel framework that integrates symbolic planning with hierarchical RL through the cooperation of high-level operators and low-level policies. Our contribution integrates planning operators (e.g. preconditions and effects) as part of the hierarchical RL algorithm based on the Scheduled Auxiliary Control (SAC-X) method. We developed a dual-purpose high-level operator, which can be used both in holistic planning and as independent, reusable policies. Our approach offers a flexible solution for long-horizon tasks, e.g., stacking a cube. The experimental results show that our proposed method obtained an average of 97.2% success rate for learning and executing the whole stack sequence, and the success rate for learning independent policies, e.g. reach (98.9%), lift (99.7%), stack (85%), etc. The training time is also reduced by 68% when using our proposed approach.</description><identifier>DOI: 10.48550/arxiv.2309.14237</identifier><language>eng</language><subject>Computer Science - Robotics</subject><creationdate>2023-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2309.14237$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2309.14237$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Jing</creatorcontrib><creatorcontrib>Dean, Emmanuel</creatorcontrib><creatorcontrib>Ramirez-Amaro, Karinne</creatorcontrib><title>Hierarchical Reinforcement Learning Based on Planning Operators</title><description>Long-horizon manipulation tasks such as stacking represent a longstanding challenge in the field of robotic manipulation, particularly when using reinforcement learning (RL) methods which often struggle to learn the correct sequence of actions for achieving these complex goals. To learn this sequence, symbolic planning methods offer a good solution based on high-level reasoning, however, planners often fall short in addressing the low-level control specificity needed for precise execution. This paper introduces a novel framework that integrates symbolic planning with hierarchical RL through the cooperation of high-level operators and low-level policies. Our contribution integrates planning operators (e.g. preconditions and effects) as part of the hierarchical RL algorithm based on the Scheduled Auxiliary Control (SAC-X) method. We developed a dual-purpose high-level operator, which can be used both in holistic planning and as independent, reusable policies. Our approach offers a flexible solution for long-horizon tasks, e.g., stacking a cube. The experimental results show that our proposed method obtained an average of 97.2% success rate for learning and executing the whole stack sequence, and the success rate for learning independent policies, e.g. reach (98.9%), lift (99.7%), stack (85%), etc. The training time is also reduced by 68% when using our proposed approach.</description><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81KAzEUhbPpQqoP4Kp5gRmTm0x-VlKLWmGgpXQ_3KY3GphmSqaIvr11dHXgcL4DH2P3UtTaNY14wPKVPmtQwtdSg7I37HGdqGAJHylgz3eUchxKoBPlC28JS075nT_hSEc-ZL7tMU_N5nylLkMZb9ksYj_S3X_O2f7leb9aV-3m9W21bCs01lYHI8k6JW1wniRqiwjBRR_Rg22cIQIN_kheG6OUAKHFdQDk4yGAFkbN2eLvdjLoziWdsHx3vybdZKJ-ALjVQvU</recordid><startdate>20230925</startdate><enddate>20230925</enddate><creator>Zhang, Jing</creator><creator>Dean, Emmanuel</creator><creator>Ramirez-Amaro, Karinne</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230925</creationdate><title>Hierarchical Reinforcement Learning Based on Planning Operators</title><author>Zhang, Jing ; Dean, Emmanuel ; Ramirez-Amaro, Karinne</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-b61e78317c89e1a47aa2c8f9fa927586ee2429de94663302040a2c2e9fbc24063</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Jing</creatorcontrib><creatorcontrib>Dean, Emmanuel</creatorcontrib><creatorcontrib>Ramirez-Amaro, Karinne</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Jing</au><au>Dean, Emmanuel</au><au>Ramirez-Amaro, Karinne</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hierarchical Reinforcement Learning Based on Planning Operators</atitle><date>2023-09-25</date><risdate>2023</risdate><abstract>Long-horizon manipulation tasks such as stacking represent a longstanding challenge in the field of robotic manipulation, particularly when using reinforcement learning (RL) methods which often struggle to learn the correct sequence of actions for achieving these complex goals. To learn this sequence, symbolic planning methods offer a good solution based on high-level reasoning, however, planners often fall short in addressing the low-level control specificity needed for precise execution. This paper introduces a novel framework that integrates symbolic planning with hierarchical RL through the cooperation of high-level operators and low-level policies. Our contribution integrates planning operators (e.g. preconditions and effects) as part of the hierarchical RL algorithm based on the Scheduled Auxiliary Control (SAC-X) method. We developed a dual-purpose high-level operator, which can be used both in holistic planning and as independent, reusable policies. Our approach offers a flexible solution for long-horizon tasks, e.g., stacking a cube. The experimental results show that our proposed method obtained an average of 97.2% success rate for learning and executing the whole stack sequence, and the success rate for learning independent policies, e.g. reach (98.9%), lift (99.7%), stack (85%), etc. The training time is also reduced by 68% when using our proposed approach.</abstract><doi>10.48550/arxiv.2309.14237</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2309.14237
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2309_14237
source	arXiv.org
subjects	Computer Science - Robotics
title	Hierarchical Reinforcement Learning Based on Planning Operators
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T05%3A57%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hierarchical%20Reinforcement%20Learning%20Based%20on%20Planning%20Operators&rft.au=Zhang,%20Jing&rft.date=2023-09-25&rft_id=info:doi/10.48550/arxiv.2309.14237&rft_dat=%3Carxiv_GOX%3E2309_14237%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true