RORL: Robust Offline Reinforcement Learning via Conservative Smoothing
Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. Howev...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2022-10 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Yang, Rui Bai, Chenjia Ma, Xiaoteng Wang, Zhaoran Zhang, Chongjie Han, Lei |
description | Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2674152750</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2674152750</sourcerecordid><originalsourceid>FETCH-proquest_journals_26741527503</originalsourceid><addsrcrecordid>eNqNjUELgjAYQEcQJOV_GHQW5uYyukrSQRBWd1nxrSb6rbbp789DP6DTO7wHb0USLkSeHQvONyQNoWeM8UPJpRQJqVWrmhNV7j6FSFtjBotAFVg0zj9gBIy0Ae3R4pPOVtPKYQA_62hnoNfRufha1I6sjR4CpD9uyb4-36pL9vbuM0GIXe8mj4vqlnWRS15KJv6rvpOVOyY</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2674152750</pqid></control><display><type>article</type><title>RORL: Robust Offline Reinforcement Learning via Conservative Smoothing</title><source>Free E- Journals</source><creator>Yang, Rui ; Bai, Chenjia ; Ma, Xiaoteng ; Wang, Zhaoran ; Zhang, Chongjie ; Han, Lei</creator><creatorcontrib>Yang, Rui ; Bai, Chenjia ; Ma, Xiaoteng ; Wang, Zhaoran ; Zhang, Chongjie ; Han, Lei</creatorcontrib><description>Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Conservatism ; Decision making ; Learning ; Perturbation ; Regularization ; Robustness ; Smoothing ; Task complexity</subject><ispartof>arXiv.org, 2022-10</ispartof><rights>2022. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Yang, Rui</creatorcontrib><creatorcontrib>Bai, Chenjia</creatorcontrib><creatorcontrib>Ma, Xiaoteng</creatorcontrib><creatorcontrib>Wang, Zhaoran</creatorcontrib><creatorcontrib>Zhang, Chongjie</creatorcontrib><creatorcontrib>Han, Lei</creatorcontrib><title>RORL: Robust Offline Reinforcement Learning via Conservative Smoothing</title><title>arXiv.org</title><description>Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.</description><subject>Algorithms</subject><subject>Conservatism</subject><subject>Decision making</subject><subject>Learning</subject><subject>Perturbation</subject><subject>Regularization</subject><subject>Robustness</subject><subject>Smoothing</subject><subject>Task complexity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjUELgjAYQEcQJOV_GHQW5uYyukrSQRBWd1nxrSb6rbbp789DP6DTO7wHb0USLkSeHQvONyQNoWeM8UPJpRQJqVWrmhNV7j6FSFtjBotAFVg0zj9gBIy0Ae3R4pPOVtPKYQA_62hnoNfRufha1I6sjR4CpD9uyb4-36pL9vbuM0GIXe8mj4vqlnWRS15KJv6rvpOVOyY</recordid><startdate>20221022</startdate><enddate>20221022</enddate><creator>Yang, Rui</creator><creator>Bai, Chenjia</creator><creator>Ma, Xiaoteng</creator><creator>Wang, Zhaoran</creator><creator>Zhang, Chongjie</creator><creator>Han, Lei</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20221022</creationdate><title>RORL: Robust Offline Reinforcement Learning via Conservative Smoothing</title><author>Yang, Rui ; Bai, Chenjia ; Ma, Xiaoteng ; Wang, Zhaoran ; Zhang, Chongjie ; Han, Lei</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_26741527503</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Algorithms</topic><topic>Conservatism</topic><topic>Decision making</topic><topic>Learning</topic><topic>Perturbation</topic><topic>Regularization</topic><topic>Robustness</topic><topic>Smoothing</topic><topic>Task complexity</topic><toplevel>online_resources</toplevel><creatorcontrib>Yang, Rui</creatorcontrib><creatorcontrib>Bai, Chenjia</creatorcontrib><creatorcontrib>Ma, Xiaoteng</creatorcontrib><creatorcontrib>Wang, Zhaoran</creatorcontrib><creatorcontrib>Zhang, Chongjie</creatorcontrib><creatorcontrib>Han, Lei</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yang, Rui</au><au>Bai, Chenjia</au><au>Ma, Xiaoteng</au><au>Wang, Zhaoran</au><au>Zhang, Chongjie</au><au>Han, Lei</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>RORL: Robust Offline Reinforcement Learning via Conservative Smoothing</atitle><jtitle>arXiv.org</jtitle><date>2022-10-22</date><risdate>2022</risdate><eissn>2331-8422</eissn><abstract>Offline reinforcement learning (RL) provides a promising direction to exploit massive amount of offline data for complex decision-making tasks. Due to the distribution shift issue, current offline RL algorithms are generally designed to be conservative in value estimation and action selection. However, such conservatism can impair the robustness of learned policies when encountering observation deviation under realistic conditions, such as sensor errors and adversarial attacks. To trade off robustness and conservatism, we propose Robust Offline Reinforcement Learning (RORL) with a novel conservative smoothing technique. In RORL, we explicitly introduce regularization on the policy and the value function for states near the dataset, as well as additional conservative value estimation on these states. Theoretically, we show RORL enjoys a tighter suboptimality bound than recent theoretical results in linear MDPs. We demonstrate that RORL can achieve state-of-the-art performance on the general offline RL benchmark and is considerably robust to adversarial observation perturbations.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2022-10 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_2674152750 |
source | Free E- Journals |
subjects | Algorithms Conservatism Decision making Learning Perturbation Regularization Robustness Smoothing Task complexity |
title | RORL: Robust Offline Reinforcement Learning via Conservative Smoothing |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-24T05%3A01%3A25IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=RORL:%20Robust%20Offline%20Reinforcement%20Learning%20via%20Conservative%20Smoothing&rft.jtitle=arXiv.org&rft.au=Yang,%20Rui&rft.date=2022-10-22&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2674152750%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2674152750&rft_id=info:pmid/&rfr_iscdi=true |