RORL: Robust Offline Reinforcement Learning via Conservative Smoothing

Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. Howev...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2022-10
Hauptverfasser: Yang, Rui, Bai, Chenjia, Ma, Xiaoteng, Wang, Zhaoran, Zhang, Chongjie, Han, Lei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Yang, Rui
Bai, Chenjia
Ma, Xiaoteng
Wang, Zhaoran
Zhang, Chongjie
Han, Lei
description Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2674152750</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2674152750</sourcerecordid><originalsourceid>FETCH-proquest_journals_26741527503</originalsourceid><addsrcrecordid>eNqNjUELgjAYQEcQJOV_GHQW5uYyukrSQRBWd1nxrSb6rbbp789DP6DTO7wHb0USLkSeHQvONyQNoWeM8UPJpRQJqVWrmhNV7j6FSFtjBotAFVg0zj9gBIy0Ae3R4pPOVtPKYQA_62hnoNfRufha1I6sjR4CpD9uyb4-36pL9vbuM0GIXe8mj4vqlnWRS15KJv6rvpOVOyY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2674152750</pqid></control><display><type>article</type><title>RORL: Robust Offline Reinforcement Learning via Conservative Smoothing</title><source>Free E- Journals</source><creator>Yang, Rui ; Bai, Chenjia ; Ma, Xiaoteng ; Wang, Zhaoran ; Zhang, Chongjie ; Han, Lei</creator><creatorcontrib>Yang, Rui ; Bai, Chenjia ; Ma, Xiaoteng ; Wang, Zhaoran ; Zhang, Chongjie ; Han, Lei</creatorcontrib><description>Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Conservatism ; Decision making ; Learning ; Perturbation ; Regularization ; Robustness ; Smoothing ; Task complexity</subject><ispartof>arXiv.org, 2022-10</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Yang, Rui</creatorcontrib><creatorcontrib>Bai, Chenjia</creatorcontrib><creatorcontrib>Ma, Xiaoteng</creatorcontrib><creatorcontrib>Wang, Zhaoran</creatorcontrib><creatorcontrib>Zhang, Chongjie</creatorcontrib><creatorcontrib>Han, Lei</creatorcontrib><title>RORL: Robust Offline Reinforcement Learning via Conservative Smoothing</title><title>arXiv.org</title><description>Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.</description><subject>Algorithms</subject><subject>Conservatism</subject><subject>Decision making</subject><subject>Learning</subject><subject>Perturbation</subject><subject>Regularization</subject><subject>Robustness</subject><subject>Smoothing</subject><subject>Task complexity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjUELgjAYQEcQJOV_GHQW5uYyukrSQRBWd1nxrSb6rbbp789DP6DTO7wHb0USLkSeHQvONyQNoWeM8UPJpRQJqVWrmhNV7j6FSFtjBotAFVg0zj9gBIy0Ae3R4pPOVtPKYQA_62hnoNfRufha1I6sjR4CpD9uyb4-36pL9vbuM0GIXe8mj4vqlnWRS15KJv6rvpOVOyY</recordid><startdate>20221022</startdate><enddate>20221022</enddate><creator>Yang, Rui</creator><creator>Bai, Chenjia</creator><creator>Ma, Xiaoteng</creator><creator>Wang, Zhaoran</creator><creator>Zhang, Chongjie</creator><creator>Han, Lei</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20221022</creationdate><title>RORL: Robust Offline Reinforcement Learning via Conservative Smoothing</title><author>Yang, Rui ; Bai, Chenjia ; Ma, Xiaoteng ; Wang, Zhaoran ; Zhang, Chongjie ; Han, Lei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26741527503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Conservatism</topic><topic>Decision making</topic><topic>Learning</topic><topic>Perturbation</topic><topic>Regularization</topic><topic>Robustness</topic><topic>Smoothing</topic><topic>Task complexity</topic><toplevel>online_resources</toplevel><creatorcontrib>Yang, Rui</creatorcontrib><creatorcontrib>Bai, Chenjia</creatorcontrib><creatorcontrib>Ma, Xiaoteng</creatorcontrib><creatorcontrib>Wang, Zhaoran</creatorcontrib><creatorcontrib>Zhang, Chongjie</creatorcontrib><creatorcontrib>Han, Lei</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Rui</au><au>Bai, Chenjia</au><au>Ma, Xiaoteng</au><au>Wang, Zhaoran</au><au>Zhang, Chongjie</au><au>Han, Lei</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>RORL: Robust Offline Reinforcement Learning via Conservative Smoothing</atitle><jtitle>arXiv.org</jtitle><date>2022-10-22</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2022-10
issn 2331-8422
language eng
recordid cdi_proquest_journals_2674152750
source Free E- Journals
subjects Algorithms
Conservatism
Decision making
Learning
Perturbation
Regularization
Robustness
Smoothing
Task complexity
title RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T05%3A01%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=RORL:%20Robust%20Offline%20Reinforcement%20Learning%20via%20Conservative%20Smoothing&rft.jtitle=arXiv.org&rft.au=Yang,%20Rui&rft.date=2022-10-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2674152750%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2674152750&rft_id=info:pmid/&rfr_iscdi=true