Examining average and discounted reward optimality criteria in reinforcement learning
In reinforcement learning (RL), the goal is to obtain an optimal policy, for which the optimality criterion is fundamentally important. Two major optimality criteria are average and discounted rewards. While the latter is more popular, it is problematic to apply in environments without an inherent n...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In reinforcement learning (RL), the goal is to obtain an optimal policy, for
which the optimality criterion is fundamentally important. Two major optimality
criteria are average and discounted rewards. While the latter is more popular,
it is problematic to apply in environments without an inherent notion of
discounting. This motivates us to revisit a) the progression of optimality
criteria in dynamic programming, b) justification for and complication of an
artificial discount factor, and c) benefits of directly maximizing the average
reward criterion, which is discounting-free. Our contributions include a
thorough examination of the relationship between average and discounted
rewards, as well as a discussion of their pros and cons in RL. We emphasize
that average-reward RL methods possess the ingredient and mechanism for
applying a family of discounting-free optimality criteria (Veinott, 1969) to
RL. |
---|---|
DOI: | 10.48550/arxiv.2107.01348 |