Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning

Model-based reinforcement learning (MBRL) aims to learn model(s) of the environment dynamics that can predict the outcome of its actions. Forward application of the model yields so called imagined trajectories (sequences of action, predicted state-reward) used to optimize the set of candidate action...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Remonda, Adrian, Veas, Eduardo, Luzhnica, Granit
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Remonda, Adrian Veas, Eduardo Luzhnica, Granit
description	Model-based reinforcement learning (MBRL) aims to learn model(s) of the environment dynamics that can predict the outcome of its actions. Forward application of the model yields so called imagined trajectories (sequences of action, predicted state-reward) used to optimize the set of candidate actions that maximize expected reward. The outcome, an ideal imagined trajectory or plan, is imperfect and typically MBRL relies on model predictive control (MPC) to overcome this by continuously re-planning from scratch, incurring thus major computational cost and increasing complexity in tasks with longer receding horizon. We propose uncertainty estimation methods for online evaluation of imagined trajectories to assess whether further planned actions can be trusted to deliver acceptable reward. These methods include comparing the error after performing the last action with the standard expected error and using model uncertainty to assess the deviation from expected outcomes. Additionally, we introduce methods that exploit the forward propagation of the dynamics model to evaluate if the remainder of the plan aligns with expected results and assess the remainder of the plan in terms of the expected reward. Our experiments demonstrate the effectiveness of the proposed uncertainty estimation methods by applying them to avoid unnecessary trajectory replanning in a shooting MBRL setting. Results highlight significant reduction on computational costs without sacrificing performance.
doi_str_mv	10.48550/arxiv.2105.05716
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2105_05716</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2105_05716</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-5e388ae3b26a2415cd72d6f34f8216e4801a16192d9aff1426a8158c6dbe520d3</originalsourceid><addsrcrecordid>eNotj8tuwyAURNl0UaX9gK7KD9gFbDDpLor6iBQpm-yta7ikVDZEmPTx96VuV6PRjEZzCLnjrG61lOwB0pf_qAVnsmay4-qaDBuTfTjRyzkGupvg5ANkH8Mj_XzDQHOkOV3mTP0SoS0W3tHkmDzO1Ac6RYsjHWAuWUIfXEwGJwyZjggplO0bcuVgnPH2X1fk-Px03L5W-8PLbrvZV6A6VUlstAZsBqFAtFwa2wmrXNM6LbjCVjMOXPG1sGtwjrelprnURtkBpWC2WZH7v9kFsj-ncjl997-w_QLb_AD9U1CL</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning</title><source>arXiv.org</source><creator>Remonda, Adrian ; Veas, Eduardo ; Luzhnica, Granit</creator><creatorcontrib>Remonda, Adrian ; Veas, Eduardo ; Luzhnica, Granit</creatorcontrib><description>Model-based reinforcement learning (MBRL) aims to learn model(s) of the environment dynamics that can predict the outcome of its actions. Forward application of the model yields so called imagined trajectories (sequences of action, predicted state-reward) used to optimize the set of candidate actions that maximize expected reward. The outcome, an ideal imagined trajectory or plan, is imperfect and typically MBRL relies on model predictive control (MPC) to overcome this by continuously re-planning from scratch, incurring thus major computational cost and increasing complexity in tasks with longer receding horizon. We propose uncertainty estimation methods for online evaluation of imagined trajectories to assess whether further planned actions can be trusted to deliver acceptable reward. These methods include comparing the error after performing the last action with the standard expected error and using model uncertainty to assess the deviation from expected outcomes. Additionally, we introduce methods that exploit the forward propagation of the dynamics model to evaluate if the remainder of the plan aligns with expected results and assess the remainder of the plan in terms of the expected reward. Our experiments demonstrate the effectiveness of the proposed uncertainty estimation methods by applying them to avoid unnecessary trajectory replanning in a shooting MBRL setting. Results highlight significant reduction on computational costs without sacrificing performance.</description><identifier>DOI: 10.48550/arxiv.2105.05716</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2021-05</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2105.05716$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2105.05716$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Remonda, Adrian</creatorcontrib><creatorcontrib>Veas, Eduardo</creatorcontrib><creatorcontrib>Luzhnica, Granit</creatorcontrib><title>Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning</title><description>Model-based reinforcement learning (MBRL) aims to learn model(s) of the environment dynamics that can predict the outcome of its actions. Forward application of the model yields so called imagined trajectories (sequences of action, predicted state-reward) used to optimize the set of candidate actions that maximize expected reward. The outcome, an ideal imagined trajectory or plan, is imperfect and typically MBRL relies on model predictive control (MPC) to overcome this by continuously re-planning from scratch, incurring thus major computational cost and increasing complexity in tasks with longer receding horizon. We propose uncertainty estimation methods for online evaluation of imagined trajectories to assess whether further planned actions can be trusted to deliver acceptable reward. These methods include comparing the error after performing the last action with the standard expected error and using model uncertainty to assess the deviation from expected outcomes. Additionally, we introduce methods that exploit the forward propagation of the dynamics model to evaluate if the remainder of the plan aligns with expected results and assess the remainder of the plan in terms of the expected reward. Our experiments demonstrate the effectiveness of the proposed uncertainty estimation methods by applying them to avoid unnecessary trajectory replanning in a shooting MBRL setting. Results highlight significant reduction on computational costs without sacrificing performance.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tuwyAURNl0UaX9gK7KD9gFbDDpLor6iBQpm-yta7ikVDZEmPTx96VuV6PRjEZzCLnjrG61lOwB0pf_qAVnsmay4-qaDBuTfTjRyzkGupvg5ANkH8Mj_XzDQHOkOV3mTP0SoS0W3tHkmDzO1Ac6RYsjHWAuWUIfXEwGJwyZjggplO0bcuVgnPH2X1fk-Px03L5W-8PLbrvZV6A6VUlstAZsBqFAtFwa2wmrXNM6LbjCVjMOXPG1sGtwjrelprnURtkBpWC2WZH7v9kFsj-ncjl997-w_QLb_AD9U1CL</recordid><startdate>20210512</startdate><enddate>20210512</enddate><creator>Remonda, Adrian</creator><creator>Veas, Eduardo</creator><creator>Luzhnica, Granit</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210512</creationdate><title>Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning</title><author>Remonda, Adrian ; Veas, Eduardo ; Luzhnica, Granit</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-5e388ae3b26a2415cd72d6f34f8216e4801a16192d9aff1426a8158c6dbe520d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Remonda, Adrian</creatorcontrib><creatorcontrib>Veas, Eduardo</creatorcontrib><creatorcontrib>Luzhnica, Granit</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Remonda, Adrian</au><au>Veas, Eduardo</au><au>Luzhnica, Granit</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning</atitle><date>2021-05-12</date><risdate>2021</risdate><abstract>Model-based reinforcement learning (MBRL) aims to learn model(s) of the environment dynamics that can predict the outcome of its actions. Forward application of the model yields so called imagined trajectories (sequences of action, predicted state-reward) used to optimize the set of candidate actions that maximize expected reward. The outcome, an ideal imagined trajectory or plan, is imperfect and typically MBRL relies on model predictive control (MPC) to overcome this by continuously re-planning from scratch, incurring thus major computational cost and increasing complexity in tasks with longer receding horizon. We propose uncertainty estimation methods for online evaluation of imagined trajectories to assess whether further planned actions can be trusted to deliver acceptable reward. These methods include comparing the error after performing the last action with the standard expected error and using model uncertainty to assess the deviation from expected outcomes. Additionally, we introduce methods that exploit the forward propagation of the dynamics model to evaluate if the remainder of the plan aligns with expected results and assess the remainder of the plan in terms of the expected reward. Our experiments demonstrate the effectiveness of the proposed uncertainty estimation methods by applying them to avoid unnecessary trajectory replanning in a shooting MBRL setting. Results highlight significant reduction on computational costs without sacrificing performance.</abstract><doi>10.48550/arxiv.2105.05716</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2105.05716
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2105_05716
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning
title	Acting upon Imagination: when to trust imagined trajectories in model based reinforcement learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-13T08%3A58%3A32IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Acting%20upon%20Imagination:%20when%20to%20trust%20imagined%20trajectories%20in%20model%20based%20reinforcement%20learning&rft.au=Remonda,%20Adrian&rft.date=2021-05-12&rft_id=info:doi/10.48550/arxiv.2105.05716&rft_dat=%3Carxiv_GOX%3E2105_05716%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true