Model-Based Uncertainty in Value Functions

We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Luis, Carlos E, Bottero, Alessandro G, Vinogradska, Julia, Berkenkamp, Felix, Peters, Jan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Luis, Carlos E
Bottero, Alessandro G
Vinogradska, Julia
Berkenkamp, Felix
Peters, Jan
description We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation, but the over-approximation may result in inefficient exploration. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values and explicitly characterizes the gap in previous work. Moreover, our uncertainty quantification technique is easily integrated into common exploration strategies and scales naturally beyond the tabular setting by using standard deep reinforcement learning architectures. Experiments in difficult exploration tasks, both in tabular and continuous control settings, show that our sharper uncertainty estimates improve sample-efficiency.
doi_str_mv 10.48550/arxiv.2302.12526
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2302_12526</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2302_12526</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-eb4c987028140e2cf14c0d628df5972d727b749c70968e55b68b7216526f35053</originalsourceid><addsrcrecordid>eNotzrsKwjAYQOEsDqI-gJOdhdbkb24dtXiDikt1LWmSQqBG6UX07b1OZzt8CE0JjqhkDC9U83D3CGIMEQEGfIjmh6uxdbhSrTXByWvbdMr57hk4H5xV3dtg03vduatvx2hQqbq1k39HKN-s83QXZsftPl1moeKCh7akOpECgyQUW9AVoRobDtJULBFgBIhS0EQLnHBpGSu5LAUQ_tZUMcMsHqHZb_vFFrfGXVTzLD7o4ouOX_dwOg0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Model-Based Uncertainty in Value Functions</title><source>arXiv.org</source><creator>Luis, Carlos E ; Bottero, Alessandro G ; Vinogradska, Julia ; Berkenkamp, Felix ; Peters, Jan</creator><creatorcontrib>Luis, Carlos E ; Bottero, Alessandro G ; Vinogradska, Julia ; Berkenkamp, Felix ; Peters, Jan</creatorcontrib><description>We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation, but the over-approximation may result in inefficient exploration. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values and explicitly characterizes the gap in previous work. Moreover, our uncertainty quantification technique is easily integrated into common exploration strategies and scales naturally beyond the tabular setting by using standard deep reinforcement learning architectures. Experiments in difficult exploration tasks, both in tabular and continuous control settings, show that our sharper uncertainty estimates improve sample-efficiency.</description><identifier>DOI: 10.48550/arxiv.2302.12526</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2023-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2302.12526$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2302.12526$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Luis, Carlos E</creatorcontrib><creatorcontrib>Bottero, Alessandro G</creatorcontrib><creatorcontrib>Vinogradska, Julia</creatorcontrib><creatorcontrib>Berkenkamp, Felix</creatorcontrib><creatorcontrib>Peters, Jan</creatorcontrib><title>Model-Based Uncertainty in Value Functions</title><description>We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation, but the over-approximation may result in inefficient exploration. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values and explicitly characterizes the gap in previous work. Moreover, our uncertainty quantification technique is easily integrated into common exploration strategies and scales naturally beyond the tabular setting by using standard deep reinforcement learning architectures. Experiments in difficult exploration tasks, both in tabular and continuous control settings, show that our sharper uncertainty estimates improve sample-efficiency.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrsKwjAYQOEsDqI-gJOdhdbkb24dtXiDikt1LWmSQqBG6UX07b1OZzt8CE0JjqhkDC9U83D3CGIMEQEGfIjmh6uxdbhSrTXByWvbdMr57hk4H5xV3dtg03vduatvx2hQqbq1k39HKN-s83QXZsftPl1moeKCh7akOpECgyQUW9AVoRobDtJULBFgBIhS0EQLnHBpGSu5LAUQ_tZUMcMsHqHZb_vFFrfGXVTzLD7o4ouOX_dwOg0</recordid><startdate>20230224</startdate><enddate>20230224</enddate><creator>Luis, Carlos E</creator><creator>Bottero, Alessandro G</creator><creator>Vinogradska, Julia</creator><creator>Berkenkamp, Felix</creator><creator>Peters, Jan</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20230224</creationdate><title>Model-Based Uncertainty in Value Functions</title><author>Luis, Carlos E ; Bottero, Alessandro G ; Vinogradska, Julia ; Berkenkamp, Felix ; Peters, Jan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-eb4c987028140e2cf14c0d628df5972d727b749c70968e55b68b7216526f35053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Luis, Carlos E</creatorcontrib><creatorcontrib>Bottero, Alessandro G</creatorcontrib><creatorcontrib>Vinogradska, Julia</creatorcontrib><creatorcontrib>Berkenkamp, Felix</creatorcontrib><creatorcontrib>Peters, Jan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Luis, Carlos E</au><au>Bottero, Alessandro G</au><au>Vinogradska, Julia</au><au>Berkenkamp, Felix</au><au>Peters, Jan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Model-Based Uncertainty in Value Functions</atitle><date>2023-02-24</date><risdate>2023</risdate><abstract>We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation, but the over-approximation may result in inefficient exploration. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values and explicitly characterizes the gap in previous work. Moreover, our uncertainty quantification technique is easily integrated into common exploration strategies and scales naturally beyond the tabular setting by using standard deep reinforcement learning architectures. Experiments in difficult exploration tasks, both in tabular and continuous control settings, show that our sharper uncertainty estimates improve sample-efficiency.</abstract><doi>10.48550/arxiv.2302.12526</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2302.12526
ispartof
issn
language eng
recordid cdi_arxiv_primary_2302_12526
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Learning
Statistics - Machine Learning
title Model-Based Uncertainty in Value Functions
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T19%3A59%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Model-Based%20Uncertainty%20in%20Value%20Functions&rft.au=Luis,%20Carlos%20E&rft.date=2023-02-24&rft_id=info:doi/10.48550/arxiv.2302.12526&rft_dat=%3Carxiv_GOX%3E2302_12526%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true