Disentangled Planning and Control in Vision Based Robotics via Reward Machines

In this work we augment a Deep Q-Learning agent with a Reward Machine (DQRM) to increase speed of learning vision-based policies for robot tasks, and overcome some of the limitations of DQN that prevent it from converging to good-quality policies. A reward machine (RM) is a finite state machine that...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Camacho, Alberto, Varley, Jacob, Jain, Deepali, Iscen, Atil, Kalashnikov, Dmitry
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Robotics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Camacho, Alberto Varley, Jacob Jain, Deepali Iscen, Atil Kalashnikov, Dmitry
description	In this work we augment a Deep Q-Learning agent with a Reward Machine (DQRM) to increase speed of learning vision-based policies for robot tasks, and overcome some of the limitations of DQN that prevent it from converging to good-quality policies. A reward machine (RM) is a finite state machine that decomposes a task into a discrete planning graph and equips the agent with a reward function to guide it toward task completion. The reward machine can be used for both reward shaping, and informing the policy what abstract state it is currently at. An abstract state is a high level simplification of the current state, defined in terms of task relevant features. These two supervisory signals of reward shaping and knowledge of current abstract state coming from the reward machine complement each other and can both be used to improve policy performance as demonstrated on several vision based robotic pick and place tasks. Particularly for vision based robotics applications, it is often easier to build a reward machine than to try and get a policy to learn the task without this structure.
doi_str_mv	10.48550/arxiv.2012.14464
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2012_14464</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2012_14464</sourcerecordid><originalsourceid>FETCH-LOGICAL-a674-576cd79a52cabb4f688f4775b1032c5200e9a8cf396074ef62ed16cbd691b6103</originalsourceid><addsrcrecordid>eNotz8lOwzAUhWFvWKDCA7DCL5BgOx6SJYRRKlBVVbfR9VQshWsURwXenlJYnc2vI32EXHBWy1YpdgXTV9rXgnFRcym1PCUvt6kEnAF3Y_B0NQJiwh0F9LTPOE95pAnpNpWUkd5AOUTrbPOcXKH7BHQdPmHy9BncW8JQzshJhLGE8_9dkM393aZ_rJavD0_99bICbWSljHbedKCEA2tl1G0bpTHKctYIpwRjoYPWxabTzMgQtQiea2e97rjVh2hBLv9uj6DhY0rvMH0Pv7DhCGt-AFCESDo</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Disentangled Planning and Control in Vision Based Robotics via Reward Machines</title><source>arXiv.org</source><creator>Camacho, Alberto ; Varley, Jacob ; Jain, Deepali ; Iscen, Atil ; Kalashnikov, Dmitry</creator><creatorcontrib>Camacho, Alberto ; Varley, Jacob ; Jain, Deepali ; Iscen, Atil ; Kalashnikov, Dmitry</creatorcontrib><description>In this work we augment a Deep Q-Learning agent with a Reward Machine (DQRM) to increase speed of learning vision-based policies for robot tasks, and overcome some of the limitations of DQN that prevent it from converging to good-quality policies. A reward machine (RM) is a finite state machine that decomposes a task into a discrete planning graph and equips the agent with a reward function to guide it toward task completion. The reward machine can be used for both reward shaping, and informing the policy what abstract state it is currently at. An abstract state is a high level simplification of the current state, defined in terms of task relevant features. These two supervisory signals of reward shaping and knowledge of current abstract state coming from the reward machine complement each other and can both be used to improve policy performance as demonstrated on several vision based robotic pick and place tasks. Particularly for vision based robotics applications, it is often easier to build a reward machine than to try and get a policy to learn the task without this structure.</description><identifier>DOI: 10.48550/arxiv.2012.14464</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Robotics</subject><creationdate>2020-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2012.14464$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2012.14464$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Camacho, Alberto</creatorcontrib><creatorcontrib>Varley, Jacob</creatorcontrib><creatorcontrib>Jain, Deepali</creatorcontrib><creatorcontrib>Iscen, Atil</creatorcontrib><creatorcontrib>Kalashnikov, Dmitry</creatorcontrib><title>Disentangled Planning and Control in Vision Based Robotics via Reward Machines</title><description>In this work we augment a Deep Q-Learning agent with a Reward Machine (DQRM) to increase speed of learning vision-based policies for robot tasks, and overcome some of the limitations of DQN that prevent it from converging to good-quality policies. A reward machine (RM) is a finite state machine that decomposes a task into a discrete planning graph and equips the agent with a reward function to guide it toward task completion. The reward machine can be used for both reward shaping, and informing the policy what abstract state it is currently at. An abstract state is a high level simplification of the current state, defined in terms of task relevant features. These two supervisory signals of reward shaping and knowledge of current abstract state coming from the reward machine complement each other and can both be used to improve policy performance as demonstrated on several vision based robotic pick and place tasks. Particularly for vision based robotics applications, it is often easier to build a reward machine than to try and get a policy to learn the task without this structure.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotz8lOwzAUhWFvWKDCA7DCL5BgOx6SJYRRKlBVVbfR9VQshWsURwXenlJYnc2vI32EXHBWy1YpdgXTV9rXgnFRcym1PCUvt6kEnAF3Y_B0NQJiwh0F9LTPOE95pAnpNpWUkd5AOUTrbPOcXKH7BHQdPmHy9BncW8JQzshJhLGE8_9dkM393aZ_rJavD0_99bICbWSljHbedKCEA2tl1G0bpTHKctYIpwRjoYPWxabTzMgQtQiea2e97rjVh2hBLv9uj6DhY0rvMH0Pv7DhCGt-AFCESDo</recordid><startdate>20201228</startdate><enddate>20201228</enddate><creator>Camacho, Alberto</creator><creator>Varley, Jacob</creator><creator>Jain, Deepali</creator><creator>Iscen, Atil</creator><creator>Kalashnikov, Dmitry</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20201228</creationdate><title>Disentangled Planning and Control in Vision Based Robotics via Reward Machines</title><author>Camacho, Alberto ; Varley, Jacob ; Jain, Deepali ; Iscen, Atil ; Kalashnikov, Dmitry</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a674-576cd79a52cabb4f688f4775b1032c5200e9a8cf396074ef62ed16cbd691b6103</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Camacho, Alberto</creatorcontrib><creatorcontrib>Varley, Jacob</creatorcontrib><creatorcontrib>Jain, Deepali</creatorcontrib><creatorcontrib>Iscen, Atil</creatorcontrib><creatorcontrib>Kalashnikov, Dmitry</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Camacho, Alberto</au><au>Varley, Jacob</au><au>Jain, Deepali</au><au>Iscen, Atil</au><au>Kalashnikov, Dmitry</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Disentangled Planning and Control in Vision Based Robotics via Reward Machines</atitle><date>2020-12-28</date><risdate>2020</risdate><abstract>In this work we augment a Deep Q-Learning agent with a Reward Machine (DQRM) to increase speed of learning vision-based policies for robot tasks, and overcome some of the limitations of DQN that prevent it from converging to good-quality policies. A reward machine (RM) is a finite state machine that decomposes a task into a discrete planning graph and equips the agent with a reward function to guide it toward task completion. The reward machine can be used for both reward shaping, and informing the policy what abstract state it is currently at. An abstract state is a high level simplification of the current state, defined in terms of task relevant features. These two supervisory signals of reward shaping and knowledge of current abstract state coming from the reward machine complement each other and can both be used to improve policy performance as demonstrated on several vision based robotic pick and place tasks. Particularly for vision based robotics applications, it is often easier to build a reward machine than to try and get a policy to learn the task without this structure.</abstract><doi>10.48550/arxiv.2012.14464</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2012.14464
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2012_14464
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Robotics
title	Disentangled Planning and Control in Vision Based Robotics via Reward Machines
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-06T08%3A50%3A23IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Disentangled%20Planning%20and%20Control%20in%20Vision%20Based%20Robotics%20via%20Reward%20Machines&rft.au=Camacho,%20Alberto&rft.date=2020-12-28&rft_id=info:doi/10.48550/arxiv.2012.14464&rft_dat=%3Carxiv_GOX%3E2012_14464%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true