Distributional Robustness and Regularization in Reinforcement Learning

Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning. Although DRO's extension to sequential decisio...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2020-07
Hauptverfasser: Derman, Esther, Mannor, Shie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Derman, Esther
Mannor, Shie
description Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning. Although DRO's extension to sequential decision-making overcomes \(\textit{external uncertainty}\) through the robust Markov Decision Process (MDP) setting, the resulting formulation is hard to solve, especially on large domains. On the other hand, existing regularization methods in reinforcement learning only address \(\textit{internal uncertainty}\) due to stochasticity. Our study aims to facilitate robust reinforcement learning by establishing a dual relation between robust MDPs and regularization. We introduce Wasserstein distributionally robust MDPs and prove that they hold out-of-sample performance guarantees. Then, we introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. We extend the result to linear value function approximation for large state spaces. Our approach provides an alternative formulation of robustness with guaranteed finite-sample performance. Moreover, it suggests using regularization as a practical tool for dealing with \(\textit{external uncertainty}\) in reinforcement learning methods.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2374927276</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2374927276</sourcerecordid><originalsourceid>FETCH-proquest_journals_23749272763</originalsourceid><addsrcrecordid>eNqNjUEKwjAQRYMgWLR3CLgu1Enb6FotLlwV95JqWqbUiWaSjae3ggdw9eG9B38mElBqk20LgIVImYc8z6HSUJYqEfUBOXhsY0BHZpSNayMHsszS0F02to-j8fg2Xy-RJoLUOX-zD0tBnq3xhNSvxLwzI9v0t0uxro-X_Sl7eveKlsN1cNFPB3wFpYsdaNCV-q_6AEK0PE0</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2374927276</pqid></control><display><type>article</type><title>Distributional Robustness and Regularization in Reinforcement Learning</title><source>Free E- Journals</source><creator>Derman, Esther ; Mannor, Shie</creator><creatorcontrib>Derman, Esther ; Mannor, Shie</creatorcontrib><description>Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning. Although DRO's extension to sequential decision-making overcomes \(\textit{external uncertainty}\) through the robust Markov Decision Process (MDP) setting, the resulting formulation is hard to solve, especially on large domains. On the other hand, existing regularization methods in reinforcement learning only address \(\textit{internal uncertainty}\) due to stochasticity. Our study aims to facilitate robust reinforcement learning by establishing a dual relation between robust MDPs and regularization. We introduce Wasserstein distributionally robust MDPs and prove that they hold out-of-sample performance guarantees. Then, we introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. We extend the result to linear value function approximation for large state spaces. Our approach provides an alternative formulation of robustness with guaranteed finite-sample performance. Moreover, it suggests using regularization as a practical tool for dealing with \(\textit{external uncertainty}\) in reinforcement learning methods.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Decision making ; Empirical analysis ; Learning ; Lower bounds ; Markov analysis ; Markov processes ; Mathematical analysis ; Optimization ; Regression analysis ; Regularization ; Regularization methods ; Robustness (mathematics) ; Statistical analysis ; Uncertainty</subject><ispartof>arXiv.org, 2020-07</ispartof><rights>2020. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Derman, Esther</creatorcontrib><creatorcontrib>Mannor, Shie</creatorcontrib><title>Distributional Robustness and Regularization in Reinforcement Learning</title><title>arXiv.org</title><description>Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning. Although DRO's extension to sequential decision-making overcomes \(\textit{external uncertainty}\) through the robust Markov Decision Process (MDP) setting, the resulting formulation is hard to solve, especially on large domains. On the other hand, existing regularization methods in reinforcement learning only address \(\textit{internal uncertainty}\) due to stochasticity. Our study aims to facilitate robust reinforcement learning by establishing a dual relation between robust MDPs and regularization. We introduce Wasserstein distributionally robust MDPs and prove that they hold out-of-sample performance guarantees. Then, we introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. We extend the result to linear value function approximation for large state spaces. Our approach provides an alternative formulation of robustness with guaranteed finite-sample performance. Moreover, it suggests using regularization as a practical tool for dealing with \(\textit{external uncertainty}\) in reinforcement learning methods.</description><subject>Decision making</subject><subject>Empirical analysis</subject><subject>Learning</subject><subject>Lower bounds</subject><subject>Markov analysis</subject><subject>Markov processes</subject><subject>Mathematical analysis</subject><subject>Optimization</subject><subject>Regression analysis</subject><subject>Regularization</subject><subject>Regularization methods</subject><subject>Robustness (mathematics)</subject><subject>Statistical analysis</subject><subject>Uncertainty</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjUEKwjAQRYMgWLR3CLgu1Enb6FotLlwV95JqWqbUiWaSjae3ggdw9eG9B38mElBqk20LgIVImYc8z6HSUJYqEfUBOXhsY0BHZpSNayMHsszS0F02to-j8fg2Xy-RJoLUOX-zD0tBnq3xhNSvxLwzI9v0t0uxro-X_Sl7eveKlsN1cNFPB3wFpYsdaNCV-q_6AEK0PE0</recordid><startdate>20200714</startdate><enddate>20200714</enddate><creator>Derman, Esther</creator><creator>Mannor, Shie</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20200714</creationdate><title>Distributional Robustness and Regularization in Reinforcement Learning</title><author>Derman, Esther ; Mannor, Shie</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_23749272763</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Decision making</topic><topic>Empirical analysis</topic><topic>Learning</topic><topic>Lower bounds</topic><topic>Markov analysis</topic><topic>Markov processes</topic><topic>Mathematical analysis</topic><topic>Optimization</topic><topic>Regression analysis</topic><topic>Regularization</topic><topic>Regularization methods</topic><topic>Robustness (mathematics)</topic><topic>Statistical analysis</topic><topic>Uncertainty</topic><toplevel>online_resources</toplevel><creatorcontrib>Derman, Esther</creatorcontrib><creatorcontrib>Mannor, Shie</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Derman, Esther</au><au>Mannor, Shie</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Distributional Robustness and Regularization in Reinforcement Learning</atitle><jtitle>arXiv.org</jtitle><date>2020-07-14</date><risdate>2020</risdate><eissn>2331-8422</eissn><abstract>Distributionally Robust Optimization (DRO) has enabled to prove the equivalence between robustness and regularization in classification and regression, thus providing an analytical reason why regularization generalizes well in statistical learning. Although DRO's extension to sequential decision-making overcomes \(\textit{external uncertainty}\) through the robust Markov Decision Process (MDP) setting, the resulting formulation is hard to solve, especially on large domains. On the other hand, existing regularization methods in reinforcement learning only address \(\textit{internal uncertainty}\) due to stochasticity. Our study aims to facilitate robust reinforcement learning by establishing a dual relation between robust MDPs and regularization. We introduce Wasserstein distributionally robust MDPs and prove that they hold out-of-sample performance guarantees. Then, we introduce a new regularizer for empirical value functions and show that it lower bounds the Wasserstein distributionally robust value function. We extend the result to linear value function approximation for large state spaces. Our approach provides an alternative formulation of robustness with guaranteed finite-sample performance. Moreover, it suggests using regularization as a practical tool for dealing with \(\textit{external uncertainty}\) in reinforcement learning methods.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2020-07
issn 2331-8422
language eng
recordid cdi_proquest_journals_2374927276
source Free E- Journals
subjects Decision making
Empirical analysis
Learning
Lower bounds
Markov analysis
Markov processes
Mathematical analysis
Optimization
Regression analysis
Regularization
Regularization methods
Robustness (mathematics)
Statistical analysis
Uncertainty
title Distributional Robustness and Regularization in Reinforcement Learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T00%3A24%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Distributional%20Robustness%20and%20Regularization%20in%20Reinforcement%20Learning&rft.jtitle=arXiv.org&rft.au=Derman,%20Esther&rft.date=2020-07-14&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2374927276%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2374927276&rft_id=info:pmid/&rfr_iscdi=true