Model-Based Uncertainty in Value Functions

We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Luis, Carlos E, Bottero, Alessandro G, Vinogradska, Julia, Berkenkamp, Felix, Peters, Jan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Luis, Carlos E Bottero, Alessandro G Vinogradska, Julia Berkenkamp, Felix Peters, Jan
description	We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation, but the over-approximation may result in inefficient exploration. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values and explicitly characterizes the gap in previous work. Moreover, our uncertainty quantification technique is easily integrated into common exploration strategies and scales naturally beyond the tabular setting by using standard deep reinforcement learning architectures. Experiments in difficult exploration tasks, both in tabular and continuous control settings, show that our sharper uncertainty estimates improve sample-efficiency.
doi_str_mv	10.48550/arxiv.2302.12526
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2302_12526</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2302_12526</sourcerecordid><originalsourceid>FETCH-LOGICAL-a676-eb4c987028140e2cf14c0d628df5972d727b749c70968e55b68b7216526f35053</originalsourceid><addsrcrecordid>eNotzrsKwjAYQOEsDqI-gJOdhdbkb24dtXiDikt1LWmSQqBG6UX07b1OZzt8CE0JjqhkDC9U83D3CGIMEQEGfIjmh6uxdbhSrTXByWvbdMr57hk4H5xV3dtg03vduatvx2hQqbq1k39HKN-s83QXZsftPl1moeKCh7akOpECgyQUW9AVoRobDtJULBFgBIhS0EQLnHBpGSu5LAUQ_tZUMcMsHqHZb_vFFrfGXVTzLD7o4ouOX_dwOg0</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Model-Based Uncertainty in Value Functions</title><source>arXiv.org</source><creator>Luis, Carlos E ; Bottero, Alessandro G ; Vinogradska, Julia ; Berkenkamp, Felix ; Peters, Jan</creator><creatorcontrib>Luis, Carlos E ; Bottero, Alessandro G ; Vinogradska, Julia ; Berkenkamp, Felix ; Peters, Jan</creatorcontrib><description>We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation, but the over-approximation may result in inefficient exploration. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values and explicitly characterizes the gap in previous work. Moreover, our uncertainty quantification technique is easily integrated into common exploration strategies and scales naturally beyond the tabular setting by using standard deep reinforcement learning architectures. Experiments in difficult exploration tasks, both in tabular and continuous control settings, show that our sharper uncertainty estimates improve sample-efficiency.</description><identifier>DOI: 10.48550/arxiv.2302.12526</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2023-02</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2302.12526$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2302.12526$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Luis, Carlos E</creatorcontrib><creatorcontrib>Bottero, Alessandro G</creatorcontrib><creatorcontrib>Vinogradska, Julia</creatorcontrib><creatorcontrib>Berkenkamp, Felix</creatorcontrib><creatorcontrib>Peters, Jan</creatorcontrib><title>Model-Based Uncertainty in Value Functions</title><description>We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation, but the over-approximation may result in inefficient exploration. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values and explicitly characterizes the gap in previous work. Moreover, our uncertainty quantification technique is easily integrated into common exploration strategies and scales naturally beyond the tabular setting by using standard deep reinforcement learning architectures. Experiments in difficult exploration tasks, both in tabular and continuous control settings, show that our sharper uncertainty estimates improve sample-efficiency.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzrsKwjAYQOEsDqI-gJOdhdbkb24dtXiDikt1LWmSQqBG6UX07b1OZzt8CE0JjqhkDC9U83D3CGIMEQEGfIjmh6uxdbhSrTXByWvbdMr57hk4H5xV3dtg03vduatvx2hQqbq1k39HKN-s83QXZsftPl1moeKCh7akOpECgyQUW9AVoRobDtJULBFgBIhS0EQLnHBpGSu5LAUQ_tZUMcMsHqHZb_vFFrfGXVTzLD7o4ouOX_dwOg0</recordid><startdate>20230224</startdate><enddate>20230224</enddate><creator>Luis, Carlos E</creator><creator>Bottero, Alessandro G</creator><creator>Vinogradska, Julia</creator><creator>Berkenkamp, Felix</creator><creator>Peters, Jan</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20230224</creationdate><title>Model-Based Uncertainty in Value Functions</title><author>Luis, Carlos E ; Bottero, Alessandro G ; Vinogradska, Julia ; Berkenkamp, Felix ; Peters, Jan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a676-eb4c987028140e2cf14c0d628df5972d727b749c70968e55b68b7216526f35053</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Luis, Carlos E</creatorcontrib><creatorcontrib>Bottero, Alessandro G</creatorcontrib><creatorcontrib>Vinogradska, Julia</creatorcontrib><creatorcontrib>Berkenkamp, Felix</creatorcontrib><creatorcontrib>Peters, Jan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Luis, Carlos E</au><au>Bottero, Alessandro G</au><au>Vinogradska, Julia</au><au>Berkenkamp, Felix</au><au>Peters, Jan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Model-Based Uncertainty in Value Functions</atitle><date>2023-02-24</date><risdate>2023</risdate><abstract>We consider the problem of quantifying uncertainty over expected cumulative rewards in model-based reinforcement learning. In particular, we focus on characterizing the variance over values induced by a distribution over MDPs. Previous work upper bounds the posterior variance over values by solving a so-called uncertainty Bellman equation, but the over-approximation may result in inefficient exploration. We propose a new uncertainty Bellman equation whose solution converges to the true posterior variance over values and explicitly characterizes the gap in previous work. Moreover, our uncertainty quantification technique is easily integrated into common exploration strategies and scales naturally beyond the tabular setting by using standard deep reinforcement learning architectures. Experiments in difficult exploration tasks, both in tabular and continuous control settings, show that our sharper uncertainty estimates improve sample-efficiency.</abstract><doi>10.48550/arxiv.2302.12526</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2302.12526
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2302_12526
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning Statistics - Machine Learning
title	Model-Based Uncertainty in Value Functions
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T19%3A59%3A46IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Model-Based%20Uncertainty%20in%20Value%20Functions&rft.au=Luis,%20Carlos%20E&rft.date=2023-02-24&rft_id=info:doi/10.48550/arxiv.2302.12526&rft_dat=%3Carxiv_GOX%3E2302_12526%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true