RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. Howev...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-10
Hauptverfasser:	Yang, Rui, Bai, Chenjia, Ma, Xiaoteng, Wang, Zhaoran, Zhang, Chongjie, Han, Lei
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Conservatism Decision making Learning Perturbation Regularization Robustness Smoothing Task complexity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Yang, Rui Bai, Chenjia Ma, Xiaoteng Wang, Zhaoran Zhang, Chongjie Han, Lei
description	Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2674152750</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2674152750</sourcerecordid><originalsourceid>FETCH-proquest_journals_26741527503</originalsourceid><addsrcrecordid>eNqNjUELgjAYQEcQJOV_GHQW5uYyukrSQRBWd1nxrSb6rbbp789DP6DTO7wHb0USLkSeHQvONyQNoWeM8UPJpRQJqVWrmhNV7j6FSFtjBotAFVg0zj9gBIy0Ae3R4pPOVtPKYQA_62hnoNfRufha1I6sjR4CpD9uyb4-36pL9vbuM0GIXe8mj4vqlnWRS15KJv6rvpOVOyY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2674152750</pqid></control><display><type>article</type><title>RORL: Robust Offline Reinforcement Learning via Conservative Smoothing</title><source>Free E- Journals</source><creator>Yang, Rui ; Bai, Chenjia ; Ma, Xiaoteng ; Wang, Zhaoran ; Zhang, Chongjie ; Han, Lei</creator><creatorcontrib>Yang, Rui ; Bai, Chenjia ; Ma, Xiaoteng ; Wang, Zhaoran ; Zhang, Chongjie ; Han, Lei</creatorcontrib><description>Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Conservatism ; Decision making ; Learning ; Perturbation ; Regularization ; Robustness ; Smoothing ; Task complexity</subject><ispartof>arXiv.org, 2022-10</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Yang, Rui</creatorcontrib><creatorcontrib>Bai, Chenjia</creatorcontrib><creatorcontrib>Ma, Xiaoteng</creatorcontrib><creatorcontrib>Wang, Zhaoran</creatorcontrib><creatorcontrib>Zhang, Chongjie</creatorcontrib><creatorcontrib>Han, Lei</creatorcontrib><title>RORL: Robust Offline Reinforcement Learning via Conservative Smoothing</title><title>arXiv.org</title><description>Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.</description><subject>Algorithms</subject><subject>Conservatism</subject><subject>Decision making</subject><subject>Learning</subject><subject>Perturbation</subject><subject>Regularization</subject><subject>Robustness</subject><subject>Smoothing</subject><subject>Task complexity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjUELgjAYQEcQJOV_GHQW5uYyukrSQRBWd1nxrSb6rbbp789DP6DTO7wHb0USLkSeHQvONyQNoWeM8UPJpRQJqVWrmhNV7j6FSFtjBotAFVg0zj9gBIy0Ae3R4pPOVtPKYQA_62hnoNfRufha1I6sjR4CpD9uyb4-36pL9vbuM0GIXe8mj4vqlnWRS15KJv6rvpOVOyY</recordid><startdate>20221022</startdate><enddate>20221022</enddate><creator>Yang, Rui</creator><creator>Bai, Chenjia</creator><creator>Ma, Xiaoteng</creator><creator>Wang, Zhaoran</creator><creator>Zhang, Chongjie</creator><creator>Han, Lei</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20221022</creationdate><title>RORL: Robust Offline Reinforcement Learning via Conservative Smoothing</title><author>Yang, Rui ; Bai, Chenjia ; Ma, Xiaoteng ; Wang, Zhaoran ; Zhang, Chongjie ; Han, Lei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26741527503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Conservatism</topic><topic>Decision making</topic><topic>Learning</topic><topic>Perturbation</topic><topic>Regularization</topic><topic>Robustness</topic><topic>Smoothing</topic><topic>Task complexity</topic><toplevel>online_resources</toplevel><creatorcontrib>Yang, Rui</creatorcontrib><creatorcontrib>Bai, Chenjia</creatorcontrib><creatorcontrib>Ma, Xiaoteng</creatorcontrib><creatorcontrib>Wang, Zhaoran</creatorcontrib><creatorcontrib>Zhang, Chongjie</creatorcontrib><creatorcontrib>Han, Lei</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Rui</au><au>Bai, Chenjia</au><au>Ma, Xiaoteng</au><au>Wang, Zhaoran</au><au>Zhang, Chongjie</au><au>Han, Lei</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>RORL: Robust Offline Reinforcement Learning via Conservative Smoothing</atitle><jtitle>arXiv.org</jtitle><date>2022-10-22</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2022-10
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2674152750
source	Free E- Journals
subjects	Algorithms Conservatism Decision making Learning Perturbation Regularization Robustness Smoothing Task complexity
title	RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T05%3A01%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=RORL:%20Robust%20Offline%20Reinforcement%20Learning%20via%20Conservative%20Smoothing&rft.jtitle=arXiv.org&rft.au=Yang,%20Rui&rft.date=2022-10-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2674152750%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2674152750&rft_id=info:pmid/&rfr_iscdi=true