Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning

Model-based reinforcement learning (MBRL) aims to learn model(s) of the environment dynamics that can predict the outcome of its actions. Forward application of the model yields so called imagined trajectories (sequences of action, predicted state-reward) used to optimize the set of candidate action...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Remonda, Adrian, Veas, Eduardo, Luzhnica, Granit
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Remonda, Adrian
Veas, Eduardo
Luzhnica, Granit
description Model-based reinforcement learning (MBRL) aims to learn model(s) of the environment dynamics that can predict the outcome of its actions. Forward application of the model yields so called imagined trajectories (sequences of action, predicted state-reward) used to optimize the set of candidate actions that maximize expected reward. The outcome, an ideal imagined trajectory or plan, is imperfect and typically MBRL relies on model predictive control (MPC) to overcome this by continuously re-planning from scratch, incurring thus major computational cost and increasing complexity in tasks with longer receding horizon. We propose uncertainty estimation methods for online evaluation of imagined trajectories to assess whether further planned actions can be trusted to deliver acceptable reward. These methods include comparing the error after performing the last action with the standard expected error and using model uncertainty to assess the deviation from expected outcomes. Additionally, we introduce methods that exploit the forward propagation of the dynamics model to evaluate if the remainder of the plan aligns with expected results and assess the remainder of the plan in terms of the expected reward. Our experiments demonstrate the effectiveness of the proposed uncertainty estimation methods by applying them to avoid unnecessary trajectory replanning in a shooting MBRL setting. Results highlight significant reduction on computational costs without sacrificing performance.
doi_str_mv 10.48550/arxiv.2105.05716
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2105_05716</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2105_05716</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-5e388ae3b26a2415cd72d6f34f8216e4801a16192d9aff1426a8158c6dbe520d3</originalsourceid><addsrcrecordid>eNotj8tuwyAURNl0UaX9gK7KD9gFbDDpLor6iBQpm-yta7ikVDZEmPTx96VuV6PRjEZzCLnjrG61lOwB0pf_qAVnsmay4-qaDBuTfTjRyzkGupvg5ANkH8Mj_XzDQHOkOV3mTP0SoS0W3tHkmDzO1Ac6RYsjHWAuWUIfXEwGJwyZjggplO0bcuVgnPH2X1fk-Px03L5W-8PLbrvZV6A6VUlstAZsBqFAtFwa2wmrXNM6LbjCVjMOXPG1sGtwjrelprnURtkBpWC2WZH7v9kFsj-ncjl997-w_QLb_AD9U1CL</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning</title><source>arXiv.org</source><creator>Remonda, Adrian ; Veas, Eduardo ; Luzhnica, Granit</creator><creatorcontrib>Remonda, Adrian ; Veas, Eduardo ; Luzhnica, Granit</creatorcontrib><description>Model-based reinforcement learning (MBRL) aims to learn model(s) of the environment dynamics that can predict the outcome of its actions. Forward application of the model yields so called imagined trajectories (sequences of action, predicted state-reward) used to optimize the set of candidate actions that maximize expected reward. The outcome, an ideal imagined trajectory or plan, is imperfect and typically MBRL relies on model predictive control (MPC) to overcome this by continuously re-planning from scratch, incurring thus major computational cost and increasing complexity in tasks with longer receding horizon. We propose uncertainty estimation methods for online evaluation of imagined trajectories to assess whether further planned actions can be trusted to deliver acceptable reward. These methods include comparing the error after performing the last action with the standard expected error and using model uncertainty to assess the deviation from expected outcomes. Additionally, we introduce methods that exploit the forward propagation of the dynamics model to evaluate if the remainder of the plan aligns with expected results and assess the remainder of the plan in terms of the expected reward. Our experiments demonstrate the effectiveness of the proposed uncertainty estimation methods by applying them to avoid unnecessary trajectory replanning in a shooting MBRL setting. Results highlight significant reduction on computational costs without sacrificing performance.</description><identifier>DOI: 10.48550/arxiv.2105.05716</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2021-05</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2105.05716$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2105.05716$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Remonda, Adrian</creatorcontrib><creatorcontrib>Veas, Eduardo</creatorcontrib><creatorcontrib>Luzhnica, Granit</creatorcontrib><title>Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning</title><description>Model-based reinforcement learning (MBRL) aims to learn model(s) of the environment dynamics that can predict the outcome of its actions. Forward application of the model yields so called imagined trajectories (sequences of action, predicted state-reward) used to optimize the set of candidate actions that maximize expected reward. The outcome, an ideal imagined trajectory or plan, is imperfect and typically MBRL relies on model predictive control (MPC) to overcome this by continuously re-planning from scratch, incurring thus major computational cost and increasing complexity in tasks with longer receding horizon. We propose uncertainty estimation methods for online evaluation of imagined trajectories to assess whether further planned actions can be trusted to deliver acceptable reward. These methods include comparing the error after performing the last action with the standard expected error and using model uncertainty to assess the deviation from expected outcomes. Additionally, we introduce methods that exploit the forward propagation of the dynamics model to evaluate if the remainder of the plan aligns with expected results and assess the remainder of the plan in terms of the expected reward. Our experiments demonstrate the effectiveness of the proposed uncertainty estimation methods by applying them to avoid unnecessary trajectory replanning in a shooting MBRL setting. Results highlight significant reduction on computational costs without sacrificing performance.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tuwyAURNl0UaX9gK7KD9gFbDDpLor6iBQpm-yta7ikVDZEmPTx96VuV6PRjEZzCLnjrG61lOwB0pf_qAVnsmay4-qaDBuTfTjRyzkGupvg5ANkH8Mj_XzDQHOkOV3mTP0SoS0W3tHkmDzO1Ac6RYsjHWAuWUIfXEwGJwyZjggplO0bcuVgnPH2X1fk-Px03L5W-8PLbrvZV6A6VUlstAZsBqFAtFwa2wmrXNM6LbjCVjMOXPG1sGtwjrelprnURtkBpWC2WZH7v9kFsj-ncjl997-w_QLb_AD9U1CL</recordid><startdate>20210512</startdate><enddate>20210512</enddate><creator>Remonda, Adrian</creator><creator>Veas, Eduardo</creator><creator>Luzhnica, Granit</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210512</creationdate><title>Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning</title><author>Remonda, Adrian ; Veas, Eduardo ; Luzhnica, Granit</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-5e388ae3b26a2415cd72d6f34f8216e4801a16192d9aff1426a8158c6dbe520d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Remonda, Adrian</creatorcontrib><creatorcontrib>Veas, Eduardo</creatorcontrib><creatorcontrib>Luzhnica, Granit</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Remonda, Adrian</au><au>Veas, Eduardo</au><au>Luzhnica, Granit</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning</atitle><date>2021-05-12</date><risdate>2021</risdate><abstract>Model-based reinforcement learning (MBRL) aims to learn model(s) of the environment dynamics that can predict the outcome of its actions. Forward application of the model yields so called imagined trajectories (sequences of action, predicted state-reward) used to optimize the set of candidate actions that maximize expected reward. The outcome, an ideal imagined trajectory or plan, is imperfect and typically MBRL relies on model predictive control (MPC) to overcome this by continuously re-planning from scratch, incurring thus major computational cost and increasing complexity in tasks with longer receding horizon. We propose uncertainty estimation methods for online evaluation of imagined trajectories to assess whether further planned actions can be trusted to deliver acceptable reward. These methods include comparing the error after performing the last action with the standard expected error and using model uncertainty to assess the deviation from expected outcomes. Additionally, we introduce methods that exploit the forward propagation of the dynamics model to evaluate if the remainder of the plan aligns with expected results and assess the remainder of the plan in terms of the expected reward. Our experiments demonstrate the effectiveness of the proposed uncertainty estimation methods by applying them to avoid unnecessary trajectory replanning in a shooting MBRL setting. Results highlight significant reduction on computational costs without sacrificing performance.</abstract><doi>10.48550/arxiv.2105.05716</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2105.05716
ispartof
issn
language eng
recordid cdi_arxiv_primary_2105_05716
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Learning
title Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T08%3A58%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Acting%20upon%20Imagination:%20when%20to%20trust%20imagined%20trajectories%20in%20model%20based%20reinforcement%20learning&rft.au=Remonda,%20Adrian&rft.date=2021-05-12&rft_id=info:doi/10.48550/arxiv.2105.05716&rft_dat=%3Carxiv_GOX%3E2105_05716%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true