Hierarchical Reinforcement Learning Based on Planning Operators
Long-horizon manipulation tasks such as stacking represent a longstanding challenge in the field of robotic manipulation, particularly when using reinforcement learning (RL) methods which often struggle to learn the correct sequence of actions for achieving these complex goals. To learn this sequenc...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Zhang, Jing Dean, Emmanuel Ramirez-Amaro, Karinne |
description | Long-horizon manipulation tasks such as stacking represent a longstanding
challenge in the field of robotic manipulation, particularly when using
reinforcement learning (RL) methods which often struggle to learn the correct
sequence of actions for achieving these complex goals. To learn this sequence,
symbolic planning methods offer a good solution based on high-level reasoning,
however, planners often fall short in addressing the low-level control
specificity needed for precise execution. This paper introduces a novel
framework that integrates symbolic planning with hierarchical RL through the
cooperation of high-level operators and low-level policies. Our contribution
integrates planning operators (e.g. preconditions and effects) as part of the
hierarchical RL algorithm based on the Scheduled Auxiliary Control (SAC-X)
method. We developed a dual-purpose high-level operator, which can be used both
in holistic planning and as independent, reusable policies. Our approach offers
a flexible solution for long-horizon tasks, e.g., stacking a cube. The
experimental results show that our proposed method obtained an average of 97.2%
success rate for learning and executing the whole stack sequence, and the
success rate for learning independent policies, e.g. reach (98.9%), lift
(99.7%), stack (85%), etc. The training time is also reduced by 68% when using
our proposed approach. |
doi_str_mv | 10.48550/arxiv.2309.14237 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2309_14237</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2309_14237</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-b61e78317c89e1a47aa2c8f9fa927586ee2429de94663302040a2c2e9fbc24063</originalsourceid><addsrcrecordid>eNotj81KAzEUhbPpQqoP4Kp5gRmTm0x-VlKLWmGgpXQ_3KY3GphmSqaIvr11dHXgcL4DH2P3UtTaNY14wPKVPmtQwtdSg7I37HGdqGAJHylgz3eUchxKoBPlC28JS075nT_hSEc-ZL7tMU_N5nylLkMZb9ksYj_S3X_O2f7leb9aV-3m9W21bCs01lYHI8k6JW1wniRqiwjBRR_Rg22cIQIN_kheG6OUAKHFdQDk4yGAFkbN2eLvdjLoziWdsHx3vybdZKJ-ALjVQvU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Hierarchical Reinforcement Learning Based on Planning Operators</title><source>arXiv.org</source><creator>Zhang, Jing ; Dean, Emmanuel ; Ramirez-Amaro, Karinne</creator><creatorcontrib>Zhang, Jing ; Dean, Emmanuel ; Ramirez-Amaro, Karinne</creatorcontrib><description>Long-horizon manipulation tasks such as stacking represent a longstanding
challenge in the field of robotic manipulation, particularly when using
reinforcement learning (RL) methods which often struggle to learn the correct
sequence of actions for achieving these complex goals. To learn this sequence,
symbolic planning methods offer a good solution based on high-level reasoning,
however, planners often fall short in addressing the low-level control
specificity needed for precise execution. This paper introduces a novel
framework that integrates symbolic planning with hierarchical RL through the
cooperation of high-level operators and low-level policies. Our contribution
integrates planning operators (e.g. preconditions and effects) as part of the
hierarchical RL algorithm based on the Scheduled Auxiliary Control (SAC-X)
method. We developed a dual-purpose high-level operator, which can be used both
in holistic planning and as independent, reusable policies. Our approach offers
a flexible solution for long-horizon tasks, e.g., stacking a cube. The
experimental results show that our proposed method obtained an average of 97.2%
success rate for learning and executing the whole stack sequence, and the
success rate for learning independent policies, e.g. reach (98.9%), lift
(99.7%), stack (85%), etc. The training time is also reduced by 68% when using
our proposed approach.</description><identifier>DOI: 10.48550/arxiv.2309.14237</identifier><language>eng</language><subject>Computer Science - Robotics</subject><creationdate>2023-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2309.14237$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2309.14237$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhang, Jing</creatorcontrib><creatorcontrib>Dean, Emmanuel</creatorcontrib><creatorcontrib>Ramirez-Amaro, Karinne</creatorcontrib><title>Hierarchical Reinforcement Learning Based on Planning Operators</title><description>Long-horizon manipulation tasks such as stacking represent a longstanding
challenge in the field of robotic manipulation, particularly when using
reinforcement learning (RL) methods which often struggle to learn the correct
sequence of actions for achieving these complex goals. To learn this sequence,
symbolic planning methods offer a good solution based on high-level reasoning,
however, planners often fall short in addressing the low-level control
specificity needed for precise execution. This paper introduces a novel
framework that integrates symbolic planning with hierarchical RL through the
cooperation of high-level operators and low-level policies. Our contribution
integrates planning operators (e.g. preconditions and effects) as part of the
hierarchical RL algorithm based on the Scheduled Auxiliary Control (SAC-X)
method. We developed a dual-purpose high-level operator, which can be used both
in holistic planning and as independent, reusable policies. Our approach offers
a flexible solution for long-horizon tasks, e.g., stacking a cube. The
experimental results show that our proposed method obtained an average of 97.2%
success rate for learning and executing the whole stack sequence, and the
success rate for learning independent policies, e.g. reach (98.9%), lift
(99.7%), stack (85%), etc. The training time is also reduced by 68% when using
our proposed approach.</description><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj81KAzEUhbPpQqoP4Kp5gRmTm0x-VlKLWmGgpXQ_3KY3GphmSqaIvr11dHXgcL4DH2P3UtTaNY14wPKVPmtQwtdSg7I37HGdqGAJHylgz3eUchxKoBPlC28JS075nT_hSEc-ZL7tMU_N5nylLkMZb9ksYj_S3X_O2f7leb9aV-3m9W21bCs01lYHI8k6JW1wniRqiwjBRR_Rg22cIQIN_kheG6OUAKHFdQDk4yGAFkbN2eLvdjLoziWdsHx3vybdZKJ-ALjVQvU</recordid><startdate>20230925</startdate><enddate>20230925</enddate><creator>Zhang, Jing</creator><creator>Dean, Emmanuel</creator><creator>Ramirez-Amaro, Karinne</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230925</creationdate><title>Hierarchical Reinforcement Learning Based on Planning Operators</title><author>Zhang, Jing ; Dean, Emmanuel ; Ramirez-Amaro, Karinne</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-b61e78317c89e1a47aa2c8f9fa927586ee2429de94663302040a2c2e9fbc24063</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhang, Jing</creatorcontrib><creatorcontrib>Dean, Emmanuel</creatorcontrib><creatorcontrib>Ramirez-Amaro, Karinne</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhang, Jing</au><au>Dean, Emmanuel</au><au>Ramirez-Amaro, Karinne</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Hierarchical Reinforcement Learning Based on Planning Operators</atitle><date>2023-09-25</date><risdate>2023</risdate><abstract>Long-horizon manipulation tasks such as stacking represent a longstanding
challenge in the field of robotic manipulation, particularly when using
reinforcement learning (RL) methods which often struggle to learn the correct
sequence of actions for achieving these complex goals. To learn this sequence,
symbolic planning methods offer a good solution based on high-level reasoning,
however, planners often fall short in addressing the low-level control
specificity needed for precise execution. This paper introduces a novel
framework that integrates symbolic planning with hierarchical RL through the
cooperation of high-level operators and low-level policies. Our contribution
integrates planning operators (e.g. preconditions and effects) as part of the
hierarchical RL algorithm based on the Scheduled Auxiliary Control (SAC-X)
method. We developed a dual-purpose high-level operator, which can be used both
in holistic planning and as independent, reusable policies. Our approach offers
a flexible solution for long-horizon tasks, e.g., stacking a cube. The
experimental results show that our proposed method obtained an average of 97.2%
success rate for learning and executing the whole stack sequence, and the
success rate for learning independent policies, e.g. reach (98.9%), lift
(99.7%), stack (85%), etc. The training time is also reduced by 68% when using
our proposed approach.</abstract><doi>10.48550/arxiv.2309.14237</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2309.14237 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2309_14237 |
source | arXiv.org |
subjects | Computer Science - Robotics |
title | Hierarchical Reinforcement Learning Based on Planning Operators |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T05%3A57%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Hierarchical%20Reinforcement%20Learning%20Based%20on%20Planning%20Operators&rft.au=Zhang,%20Jing&rft.date=2023-09-25&rft_id=info:doi/10.48550/arxiv.2309.14237&rft_dat=%3Carxiv_GOX%3E2309_14237%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |