Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning
It is well known that quantifying uncertainty in the action-value estimates is crucial for efficient exploration in reinforcement learning. Ensemble sampling offers a relatively computationally tractable way of doing this using randomized value functions. However, it still requires a huge amount of...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Tan, Tian Xiong, Zhihan Dwaracherla, Vikranth R |
description | It is well known that quantifying uncertainty in the action-value estimates
is crucial for efficient exploration in reinforcement learning. Ensemble
sampling offers a relatively computationally tractable way of doing this using
randomized value functions. However, it still requires a huge amount of
computational resources for complex problems. In this paper, we present an
alternative, computationally efficient way to induce exploration using index
sampling. We use an indexed value function to represent uncertainty in our
action-value estimates. We first present an algorithm to learn parameterized
indexed value function through a distributional version of temporal difference
in a tabular setting and prove its regret bound. Then, in a computational point
of view, we propose a dual-network architecture, Parameterized Indexed Networks
(PINs), comprising one mean network and one uncertainty network to learn the
indexed value function. Finally, we show the efficacy of PINs through
computational experiments. |
doi_str_mv | 10.48550/arxiv.1912.10577 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1912_10577</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1912_10577</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-adcfe6f26f597ca5614bc6eba89ad302bbb61e36136007978cbf058e1ab993c63</originalsourceid><addsrcrecordid>eNotj0FLwzAYhnPZQTZ_gCfzB9olzZI0RxmdDgoOGeKtfEm_SKBNR9ZJ9de7VU_P4Xl54SHkgbN8U0rJ1pCm8JVzw4ucM6n1Hfk4QIIeR0zhB1u6jy1OV75Dd0G6u0Q3hiFSPyRaeR9cwDjSajp1Q4LZhEjfMMTrwGF_kzVCiiF-rsjCQ3fG-38uyXFXHbcvWf36vN8-1RkorTNonUflC-Wl0Q6k4hvrFFooDbSCFdZaxVEoLhRj2ujSWc9kiRysMcIpsSSPf7dzWnNKoYf03dwSmzlR_AKrUk2r</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning</title><source>arXiv.org</source><creator>Tan, Tian ; Xiong, Zhihan ; Dwaracherla, Vikranth R</creator><creatorcontrib>Tan, Tian ; Xiong, Zhihan ; Dwaracherla, Vikranth R</creatorcontrib><description>It is well known that quantifying uncertainty in the action-value estimates
is crucial for efficient exploration in reinforcement learning. Ensemble
sampling offers a relatively computationally tractable way of doing this using
randomized value functions. However, it still requires a huge amount of
computational resources for complex problems. In this paper, we present an
alternative, computationally efficient way to induce exploration using index
sampling. We use an indexed value function to represent uncertainty in our
action-value estimates. We first present an algorithm to learn parameterized
indexed value function through a distributional version of temporal difference
in a tabular setting and prove its regret bound. Then, in a computational point
of view, we propose a dual-network architecture, Parameterized Indexed Networks
(PINs), comprising one mean network and one uncertainty network to learn the
indexed value function. Finally, we show the efficacy of PINs through
computational experiments.</description><identifier>DOI: 10.48550/arxiv.1912.10577</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Statistics - Machine Learning</subject><creationdate>2019-12</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,776,881</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1912.10577$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1912.10577$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Tan, Tian</creatorcontrib><creatorcontrib>Xiong, Zhihan</creatorcontrib><creatorcontrib>Dwaracherla, Vikranth R</creatorcontrib><title>Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning</title><description>It is well known that quantifying uncertainty in the action-value estimates
is crucial for efficient exploration in reinforcement learning. Ensemble
sampling offers a relatively computationally tractable way of doing this using
randomized value functions. However, it still requires a huge amount of
computational resources for complex problems. In this paper, we present an
alternative, computationally efficient way to induce exploration using index
sampling. We use an indexed value function to represent uncertainty in our
action-value estimates. We first present an algorithm to learn parameterized
indexed value function through a distributional version of temporal difference
in a tabular setting and prove its regret bound. Then, in a computational point
of view, we propose a dual-network architecture, Parameterized Indexed Networks
(PINs), comprising one mean network and one uncertainty network to learn the
indexed value function. Finally, we show the efficacy of PINs through
computational experiments.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj0FLwzAYhnPZQTZ_gCfzB9olzZI0RxmdDgoOGeKtfEm_SKBNR9ZJ9de7VU_P4Xl54SHkgbN8U0rJ1pCm8JVzw4ucM6n1Hfk4QIIeR0zhB1u6jy1OV75Dd0G6u0Q3hiFSPyRaeR9cwDjSajp1Q4LZhEjfMMTrwGF_kzVCiiF-rsjCQ3fG-38uyXFXHbcvWf36vN8-1RkorTNonUflC-Wl0Q6k4hvrFFooDbSCFdZaxVEoLhRj2ujSWc9kiRysMcIpsSSPf7dzWnNKoYf03dwSmzlR_AKrUk2r</recordid><startdate>20191222</startdate><enddate>20191222</enddate><creator>Tan, Tian</creator><creator>Xiong, Zhihan</creator><creator>Dwaracherla, Vikranth R</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20191222</creationdate><title>Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning</title><author>Tan, Tian ; Xiong, Zhihan ; Dwaracherla, Vikranth R</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-adcfe6f26f597ca5614bc6eba89ad302bbb61e36136007978cbf058e1ab993c63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Tan, Tian</creatorcontrib><creatorcontrib>Xiong, Zhihan</creatorcontrib><creatorcontrib>Dwaracherla, Vikranth R</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tan, Tian</au><au>Xiong, Zhihan</au><au>Dwaracherla, Vikranth R</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning</atitle><date>2019-12-22</date><risdate>2019</risdate><abstract>It is well known that quantifying uncertainty in the action-value estimates
is crucial for efficient exploration in reinforcement learning. Ensemble
sampling offers a relatively computationally tractable way of doing this using
randomized value functions. However, it still requires a huge amount of
computational resources for complex problems. In this paper, we present an
alternative, computationally efficient way to induce exploration using index
sampling. We use an indexed value function to represent uncertainty in our
action-value estimates. We first present an algorithm to learn parameterized
indexed value function through a distributional version of temporal difference
in a tabular setting and prove its regret bound. Then, in a computational point
of view, we propose a dual-network architecture, Parameterized Indexed Networks
(PINs), comprising one mean network and one uncertainty network to learn the
indexed value function. Finally, we show the efficacy of PINs through
computational experiments.</abstract><doi>10.48550/arxiv.1912.10577</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.1912.10577 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_1912_10577 |
source | arXiv.org |
subjects | Computer Science - Artificial Intelligence Computer Science - Learning Statistics - Machine Learning |
title | Parameterized Indexed Value Function for Efficient Exploration in Reinforcement Learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-29T05%3A17%3A27IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Parameterized%20Indexed%20Value%20Function%20for%20Efficient%20Exploration%20in%20Reinforcement%20Learning&rft.au=Tan,%20Tian&rft.date=2019-12-22&rft_id=info:doi/10.48550/arxiv.1912.10577&rft_dat=%3Carxiv_GOX%3E1912_10577%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |