Goal-Space Planning with Subgoal Models

This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lo, Chunlok, Roice, Kevin, Panahi, Parham Mohammad, Jordan, Scott, White, Adam, Mihucz, Gabor, Aminmansour, Farzane, White, Martha
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Lo, Chunlok Roice, Kevin Panahi, Parham Mohammad Jordan, Scott White, Adam Mihucz, Gabor Aminmansour, Farzane White, Martha
description	This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.
doi_str_mv	10.48550/arxiv.2206.02902
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2206_02902</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2206_02902</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-8be1dda27dee08557fcbd06c36876af3922f1d5272c8a1f6fa45077084e975cc3</originalsourceid><addsrcrecordid>eNotzrkOglAUBNDXWBj1A6ykswIvF95CaYxbotFEenJ5C5IgGtz_3rWaYpKZw1g_hCBWnMOImkd5CxBBBIAJYJsN50eq_N2JtPW2FdV1WRfevbzsvd01L96dtz4aW527rOWoOtvePzssnU3TycJfbebLyXjlk5Doq9yGxhBKYy28H6XTuQGhI6GkIBcliC40HCVqRaETjmIOUoKKbSK51lGHDX6zX2l2asoDNc_sI86-4ugFWJw6HA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Goal-Space Planning with Subgoal Models</title><source>arXiv.org</source><creator>Lo, Chunlok ; Roice, Kevin ; Panahi, Parham Mohammad ; Jordan, Scott ; White, Adam ; Mihucz, Gabor ; Aminmansour, Farzane ; White, Martha</creator><creatorcontrib>Lo, Chunlok ; Roice, Kevin ; Panahi, Parham Mohammad ; Jordan, Scott ; White, Adam ; Mihucz, Gabor ; Aminmansour, Farzane ; White, Martha</creatorcontrib><description>This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.</description><identifier>DOI: 10.48550/arxiv.2206.02902</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2022-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2206.02902$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2206.02902$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lo, Chunlok</creatorcontrib><creatorcontrib>Roice, Kevin</creatorcontrib><creatorcontrib>Panahi, Parham Mohammad</creatorcontrib><creatorcontrib>Jordan, Scott</creatorcontrib><creatorcontrib>White, Adam</creatorcontrib><creatorcontrib>Mihucz, Gabor</creatorcontrib><creatorcontrib>Aminmansour, Farzane</creatorcontrib><creatorcontrib>White, Martha</creatorcontrib><title>Goal-Space Planning with Subgoal Models</title><description>This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrkOglAUBNDXWBj1A6ykswIvF95CaYxbotFEenJ5C5IgGtz_3rWaYpKZw1g_hCBWnMOImkd5CxBBBIAJYJsN50eq_N2JtPW2FdV1WRfevbzsvd01L96dtz4aW527rOWoOtvePzssnU3TycJfbebLyXjlk5Doq9yGxhBKYy28H6XTuQGhI6GkIBcliC40HCVqRaETjmIOUoKKbSK51lGHDX6zX2l2asoDNc_sI86-4ugFWJw6HA</recordid><startdate>20220606</startdate><enddate>20220606</enddate><creator>Lo, Chunlok</creator><creator>Roice, Kevin</creator><creator>Panahi, Parham Mohammad</creator><creator>Jordan, Scott</creator><creator>White, Adam</creator><creator>Mihucz, Gabor</creator><creator>Aminmansour, Farzane</creator><creator>White, Martha</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220606</creationdate><title>Goal-Space Planning with Subgoal Models</title><author>Lo, Chunlok ; Roice, Kevin ; Panahi, Parham Mohammad ; Jordan, Scott ; White, Adam ; Mihucz, Gabor ; Aminmansour, Farzane ; White, Martha</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-8be1dda27dee08557fcbd06c36876af3922f1d5272c8a1f6fa45077084e975cc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Lo, Chunlok</creatorcontrib><creatorcontrib>Roice, Kevin</creatorcontrib><creatorcontrib>Panahi, Parham Mohammad</creatorcontrib><creatorcontrib>Jordan, Scott</creatorcontrib><creatorcontrib>White, Adam</creatorcontrib><creatorcontrib>Mihucz, Gabor</creatorcontrib><creatorcontrib>Aminmansour, Farzane</creatorcontrib><creatorcontrib>White, Martha</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lo, Chunlok</au><au>Roice, Kevin</au><au>Panahi, Parham Mohammad</au><au>Jordan, Scott</au><au>White, Adam</au><au>Mihucz, Gabor</au><au>Aminmansour, Farzane</au><au>White, Martha</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Goal-Space Planning with Subgoal Models</atitle><date>2022-06-06</date><risdate>2022</risdate><abstract>This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.</abstract><doi>10.48550/arxiv.2206.02902</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2206.02902
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2206_02902
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning
title	Goal-Space Planning with Subgoal Models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T20%3A34%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Goal-Space%20Planning%20with%20Subgoal%20Models&rft.au=Lo,%20Chunlok&rft.date=2022-06-06&rft_id=info:doi/10.48550/arxiv.2206.02902&rft_dat=%3Carxiv_GOX%3E2206_02902%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true