Beyond Exponentially Discounted Sum: Automatic Learning of Return Function
In reinforcement learning, Return, which is the weighted accumulated future rewards, and Value, which is the expected return, serve as the objective that guides the learning of the policy. In classic RL, return is defined as the exponentially discounted sum of future rewards. One key insight is that...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Schreiben Sie den ersten Kommentar!