Goal-Space Planning with Subgoal Models

This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Lo, Chunlok, Roice, Kevin, Panahi, Parham Mohammad, Jordan, Scott, White, Adam, Mihucz, Gabor, Aminmansour, Farzane, White, Martha
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Lo, Chunlok
Roice, Kevin
Panahi, Parham Mohammad
Jordan, Scott
White, Adam
Mihucz, Gabor
Aminmansour, Farzane
White, Martha
description This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.
doi_str_mv 10.48550/arxiv.2206.02902
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2206_02902</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2206_02902</sourcerecordid><originalsourceid>FETCH-LOGICAL-a672-8be1dda27dee08557fcbd06c36876af3922f1d5272c8a1f6fa45077084e975cc3</originalsourceid><addsrcrecordid>eNotzrkOglAUBNDXWBj1A6ykswIvF95CaYxbotFEenJ5C5IgGtz_3rWaYpKZw1g_hCBWnMOImkd5CxBBBIAJYJsN50eq_N2JtPW2FdV1WRfevbzsvd01L96dtz4aW527rOWoOtvePzssnU3TycJfbebLyXjlk5Doq9yGxhBKYy28H6XTuQGhI6GkIBcliC40HCVqRaETjmIOUoKKbSK51lGHDX6zX2l2asoDNc_sI86-4ugFWJw6HA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Goal-Space Planning with Subgoal Models</title><source>arXiv.org</source><creator>Lo, Chunlok ; Roice, Kevin ; Panahi, Parham Mohammad ; Jordan, Scott ; White, Adam ; Mihucz, Gabor ; Aminmansour, Farzane ; White, Martha</creator><creatorcontrib>Lo, Chunlok ; Roice, Kevin ; Panahi, Parham Mohammad ; Jordan, Scott ; White, Adam ; Mihucz, Gabor ; Aminmansour, Farzane ; White, Martha</creatorcontrib><description>This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.</description><identifier>DOI: 10.48550/arxiv.2206.02902</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2022-06</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2206.02902$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2206.02902$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lo, Chunlok</creatorcontrib><creatorcontrib>Roice, Kevin</creatorcontrib><creatorcontrib>Panahi, Parham Mohammad</creatorcontrib><creatorcontrib>Jordan, Scott</creatorcontrib><creatorcontrib>White, Adam</creatorcontrib><creatorcontrib>Mihucz, Gabor</creatorcontrib><creatorcontrib>Aminmansour, Farzane</creatorcontrib><creatorcontrib>White, Martha</creatorcontrib><title>Goal-Space Planning with Subgoal Models</title><description>This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrkOglAUBNDXWBj1A6ykswIvF95CaYxbotFEenJ5C5IgGtz_3rWaYpKZw1g_hCBWnMOImkd5CxBBBIAJYJsN50eq_N2JtPW2FdV1WRfevbzsvd01L96dtz4aW527rOWoOtvePzssnU3TycJfbebLyXjlk5Doq9yGxhBKYy28H6XTuQGhI6GkIBcliC40HCVqRaETjmIOUoKKbSK51lGHDX6zX2l2asoDNc_sI86-4ugFWJw6HA</recordid><startdate>20220606</startdate><enddate>20220606</enddate><creator>Lo, Chunlok</creator><creator>Roice, Kevin</creator><creator>Panahi, Parham Mohammad</creator><creator>Jordan, Scott</creator><creator>White, Adam</creator><creator>Mihucz, Gabor</creator><creator>Aminmansour, Farzane</creator><creator>White, Martha</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220606</creationdate><title>Goal-Space Planning with Subgoal Models</title><author>Lo, Chunlok ; Roice, Kevin ; Panahi, Parham Mohammad ; Jordan, Scott ; White, Adam ; Mihucz, Gabor ; Aminmansour, Farzane ; White, Martha</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a672-8be1dda27dee08557fcbd06c36876af3922f1d5272c8a1f6fa45077084e975cc3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Lo, Chunlok</creatorcontrib><creatorcontrib>Roice, Kevin</creatorcontrib><creatorcontrib>Panahi, Parham Mohammad</creatorcontrib><creatorcontrib>Jordan, Scott</creatorcontrib><creatorcontrib>White, Adam</creatorcontrib><creatorcontrib>Mihucz, Gabor</creatorcontrib><creatorcontrib>Aminmansour, Farzane</creatorcontrib><creatorcontrib>White, Martha</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lo, Chunlok</au><au>Roice, Kevin</au><au>Panahi, Parham Mohammad</au><au>Jordan, Scott</au><au>White, Adam</au><au>Mihucz, Gabor</au><au>Aminmansour, Farzane</au><au>White, Martha</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Goal-Space Planning with Subgoal Models</atitle><date>2022-06-06</date><risdate>2022</risdate><abstract>This paper investigates a new approach to model-based reinforcement learning using background planning: mixing (approximate) dynamic programming updates and model-free updates, similar to the Dyna architecture. Background planning with learned models is often worse than model-free alternatives, such as Double DQN, even though the former uses significantly more memory and computation. The fundamental problem is that learned models can be inaccurate and often generate invalid states, especially when iterated many steps. In this paper, we avoid this limitation by constraining background planning to a set of (abstract) subgoals and learning only local, subgoal-conditioned models. This goal-space planning (GSP) approach is more computationally efficient, naturally incorporates temporal abstraction for faster long-horizon planning and avoids learning the transition dynamics entirely. We show that our GSP algorithm can propagate value from an abstract space in a manner that helps a variety of base learners learn significantly faster in different domains.</abstract><doi>10.48550/arxiv.2206.02902</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2206.02902
ispartof
issn
language eng
recordid cdi_arxiv_primary_2206_02902
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Learning
title Goal-Space Planning with Subgoal Models
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-06T20%3A34%3A54IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Goal-Space%20Planning%20with%20Subgoal%20Models&rft.au=Lo,%20Chunlok&rft.date=2022-06-06&rft_id=info:doi/10.48550/arxiv.2206.02902&rft_dat=%3Carxiv_GOX%3E2206_02902%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true