Policy Gradient Methods for Distortion Risk Measures
We propose policy gradient algorithms which learn risk-sensitive policies in a reinforcement learning (RL) framework. Our proposed algorithms maximize the distortion risk measure (DRM) of the cumulative reward in an episodic Markov decision process in on-policy and off-policy RL settings, respective...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose policy gradient algorithms which learn risk-sensitive policies in
a reinforcement learning (RL) framework. Our proposed algorithms maximize the
distortion risk measure (DRM) of the cumulative reward in an episodic Markov
decision process in on-policy and off-policy RL settings, respectively. We
derive a variant of the policy gradient theorem that caters to the DRM
objective, and integrate it with a likelihood ratio-based gradient estimation
scheme. We derive non-asymptotic bounds that establish the convergence of our
proposed algorithms to an approximate stationary point of the DRM objective. |
---|---|
DOI: | 10.48550/arxiv.2107.04422 |