Absolute Policy Optimization

In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Zhao, Weiye, Li, Feihan, Sun, Yifan, Chen, Rui, Wei, Tianhao, Liu, Changliu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Zhao, Weiye
Li, Feihan
Sun, Yifan
Chen, Rui
Wei, Tianhao
Liu, Changliu
description In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function, optimizing which leads to guaranteed monotonic improvement in the lower probability bound of performance with high confidence. Building upon this groundbreaking theoretical advancement, we further introduce a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO as well as its efficient variation Proximal Absolute Policy Optimization (PAPO) significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in worst-case performance, as well as expected performance.
doi_str_mv 10.48550/arxiv.2310.13230
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2310_13230</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2310_13230</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-95ff574fb2e5b679a1f5c5926bde241f8f6f2e78c84949de51c27ca29fef21273</originalsourceid><addsrcrecordid>eNotzksKwjAUheFMHIi6AEGwG2htbpImGUrxBUIddF6SmAuB1kqtoq7e5-jAPzh8hExpmnAlRLow3T3cEmDvQBmwdEhmS3tp62vvo0NbB_eIinMfmvA0fWhPYzJAU1_85L8jUq5XZb6N98Vmly_3sclkGmuBKCRHC17YTGpDUTihIbNHD5yiwgzBS-UU11wfvaAOpDOg0SNQkGxE5r_bL686d6Ex3aP6MKsvk70AvlM20Q</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Absolute Policy Optimization</title><source>arXiv.org</source><creator>Zhao, Weiye ; Li, Feihan ; Sun, Yifan ; Chen, Rui ; Wei, Tianhao ; Liu, Changliu</creator><creatorcontrib>Zhao, Weiye ; Li, Feihan ; Sun, Yifan ; Chen, Rui ; Wei, Tianhao ; Liu, Changliu</creatorcontrib><description>In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function, optimizing which leads to guaranteed monotonic improvement in the lower probability bound of performance with high confidence. Building upon this groundbreaking theoretical advancement, we further introduce a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO as well as its efficient variation Proximal Absolute Policy Optimization (PAPO) significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in worst-case performance, as well as expected performance.</description><identifier>DOI: 10.48550/arxiv.2310.13230</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Computer Science - Robotics</subject><creationdate>2023-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2310.13230$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2310.13230$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Zhao, Weiye</creatorcontrib><creatorcontrib>Li, Feihan</creatorcontrib><creatorcontrib>Sun, Yifan</creatorcontrib><creatorcontrib>Chen, Rui</creatorcontrib><creatorcontrib>Wei, Tianhao</creatorcontrib><creatorcontrib>Liu, Changliu</creatorcontrib><title>Absolute Policy Optimization</title><description>In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function, optimizing which leads to guaranteed monotonic improvement in the lower probability bound of performance with high confidence. Building upon this groundbreaking theoretical advancement, we further introduce a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO as well as its efficient variation Proximal Absolute Policy Optimization (PAPO) significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in worst-case performance, as well as expected performance.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzksKwjAUheFMHIi6AEGwG2htbpImGUrxBUIddF6SmAuB1kqtoq7e5-jAPzh8hExpmnAlRLow3T3cEmDvQBmwdEhmS3tp62vvo0NbB_eIinMfmvA0fWhPYzJAU1_85L8jUq5XZb6N98Vmly_3sclkGmuBKCRHC17YTGpDUTihIbNHD5yiwgzBS-UU11wfvaAOpDOg0SNQkGxE5r_bL686d6Ex3aP6MKsvk70AvlM20Q</recordid><startdate>20231019</startdate><enddate>20231019</enddate><creator>Zhao, Weiye</creator><creator>Li, Feihan</creator><creator>Sun, Yifan</creator><creator>Chen, Rui</creator><creator>Wei, Tianhao</creator><creator>Liu, Changliu</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231019</creationdate><title>Absolute Policy Optimization</title><author>Zhao, Weiye ; Li, Feihan ; Sun, Yifan ; Chen, Rui ; Wei, Tianhao ; Liu, Changliu</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-95ff574fb2e5b679a1f5c5926bde241f8f6f2e78c84949de51c27ca29fef21273</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Zhao, Weiye</creatorcontrib><creatorcontrib>Li, Feihan</creatorcontrib><creatorcontrib>Sun, Yifan</creatorcontrib><creatorcontrib>Chen, Rui</creatorcontrib><creatorcontrib>Wei, Tianhao</creatorcontrib><creatorcontrib>Liu, Changliu</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Zhao, Weiye</au><au>Li, Feihan</au><au>Sun, Yifan</au><au>Chen, Rui</au><au>Wei, Tianhao</au><au>Liu, Changliu</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Absolute Policy Optimization</atitle><date>2023-10-19</date><risdate>2023</risdate><abstract>In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability to control over the worst-case performance outcomes. To address this limitation, we introduce a novel objective function, optimizing which leads to guaranteed monotonic improvement in the lower probability bound of performance with high confidence. Building upon this groundbreaking theoretical advancement, we further introduce a practical solution called Absolute Policy Optimization (APO). Our experiments demonstrate the effectiveness of our approach across challenging continuous control benchmark tasks and extend its applicability to mastering Atari games. Our findings reveal that APO as well as its efficient variation Proximal Absolute Policy Optimization (PAPO) significantly outperforms state-of-the-art policy gradient algorithms, resulting in substantial improvements in worst-case performance, as well as expected performance.</abstract><doi>10.48550/arxiv.2310.13230</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2310.13230
ispartof
issn
language eng
recordid cdi_arxiv_primary_2310_13230
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Learning
Computer Science - Robotics
title Absolute Policy Optimization
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-20T05%3A05%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Absolute%20Policy%20Optimization&rft.au=Zhao,%20Weiye&rft.date=2023-10-19&rft_id=info:doi/10.48550/arxiv.2310.13230&rft_dat=%3Carxiv_GOX%3E2310_13230%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true