Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning

It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Tan, Tian, Xiong, Zhihan, Dwaracherla, Vikranth R
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Tan, Tian Xiong, Zhihan Dwaracherla, Vikranth R
description	It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments.
doi_str_mv	10.48550/arxiv.1912.10577
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1912_10577</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1912_10577</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-adcfe6f26f597ca5614bc6eba89ad302bbb61e36136007978cbf058e1ab993c63</originalsourceid><addsrcrecordid>eNotj0FLwzAYhnPZQTZ_gCfzB9olzZI0RxmdDgoOGeKtfEm_SKBNR9ZJ9de7VU_P4Xl54SHkgbN8U0rJ1pCm8JVzw4ucM6n1Hfk4QIIeR0zhB1u6jy1OV75Dd0G6u0Q3hiFSPyRaeR9cwDjSajp1Q4LZhEjfMMTrwGF_kzVCiiF-rsjCQ3fG-38uyXFXHbcvWf36vN8-1RkorTNonUflC-Wl0Q6k4hvrFFooDbSCFdZaxVEoLhRj2ujSWc9kiRysMcIpsSSPf7dzWnNKoYf03dwSmzlR_AKrUk2r</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning</title><source>arXiv.org</source><creator>Tan, Tian ; Xiong, Zhihan ; Dwaracherla, Vikranth R</creator><creatorcontrib>Tan, Tian ; Xiong, Zhihan ; Dwaracherla, Vikranth R</creatorcontrib><description>It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments.</description><identifier>DOI: 10.48550/arxiv.1912.10577</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2019-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1912.10577$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1912.10577$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Tan, Tian</creatorcontrib><creatorcontrib>Xiong, Zhihan</creatorcontrib><creatorcontrib>Dwaracherla, Vikranth R</creatorcontrib><title>Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning</title><description>It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj0FLwzAYhnPZQTZ_gCfzB9olzZI0RxmdDgoOGeKtfEm_SKBNR9ZJ9de7VU_P4Xl54SHkgbN8U0rJ1pCm8JVzw4ucM6n1Hfk4QIIeR0zhB1u6jy1OV75Dd0G6u0Q3hiFSPyRaeR9cwDjSajp1Q4LZhEjfMMTrwGF_kzVCiiF-rsjCQ3fG-38uyXFXHbcvWf36vN8-1RkorTNonUflC-Wl0Q6k4hvrFFooDbSCFdZaxVEoLhRj2ujSWc9kiRysMcIpsSSPf7dzWnNKoYf03dwSmzlR_AKrUk2r</recordid><startdate>20191222</startdate><enddate>20191222</enddate><creator>Tan, Tian</creator><creator>Xiong, Zhihan</creator><creator>Dwaracherla, Vikranth R</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20191222</creationdate><title>Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning</title><author>Tan, Tian ; Xiong, Zhihan ; Dwaracherla, Vikranth R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-adcfe6f26f597ca5614bc6eba89ad302bbb61e36136007978cbf058e1ab993c63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Tan, Tian</creatorcontrib><creatorcontrib>Xiong, Zhihan</creatorcontrib><creatorcontrib>Dwaracherla, Vikranth R</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tan, Tian</au><au>Xiong, Zhihan</au><au>Dwaracherla, Vikranth R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning</atitle><date>2019-12-22</date><risdate>2019</risdate><abstract>It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of computational resources for complex problems. In this paper, we present an alternative, computationally efficient way to induce exploration using index sampling. We use an indexed value function to represent uncertainty in our action-value estimates. We first present an algorithm to learn parameterized indexed value function through a distributional version of temporal difference in a tabular setting and prove its regret bound. Then, in a computational point of view, we propose a dual-network architecture, Parameterized Indexed Networks (PINs), comprising one mean network and one uncertainty network to learn the indexed value function. Finally, we show the efficacy of PINs through computational experiments.</abstract><doi>10.48550/arxiv.1912.10577</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1912.10577
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1912_10577
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning Statistics - Machine Learning
title	Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T05%3A17%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Parameterized%20Indexed%20Value%20Function%20for%20Efficient%20Exploration%20in%20Reinforcement%20Learning&rft.au=Tan,%20Tian&rft.date=2019-12-22&rft_id=info:doi/10.48550/arxiv.1912.10577&rft_dat=%3Carxiv_GOX%3E1912_10577%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true