Absolute Policy Optimization
In recent years, trust region on-policy reinforcement learning has achieved impressive results in addressing complex control tasks and gaming scenarios. However, contemporary state-of-the-art algorithms within this category primarily emphasize improvement in expected performance, lacking the ability...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In recent years, trust region on-policy reinforcement learning has achieved
impressive results in addressing complex control tasks and gaming scenarios.
However, contemporary state-of-the-art algorithms within this category
primarily emphasize improvement in expected performance, lacking the ability to
control over the worst-case performance outcomes. To address this limitation,
we introduce a novel objective function, optimizing which leads to guaranteed
monotonic improvement in the lower probability bound of performance with high
confidence. Building upon this groundbreaking theoretical advancement, we
further introduce a practical solution called Absolute Policy Optimization
(APO). Our experiments demonstrate the effectiveness of our approach across
challenging continuous control benchmark tasks and extend its applicability to
mastering Atari games. Our findings reveal that APO as well as its efficient
variation Proximal Absolute Policy Optimization (PAPO) significantly
outperforms state-of-the-art policy gradient algorithms, resulting in
substantial improvements in worst-case performance, as well as expected
performance. |
---|---|
DOI: | 10.48550/arxiv.2310.13230 |