Beyond Exponentially Discounted Sum: Automatic Learning of Return Function

In reinforcement learning, Return, which is the weighted accumulated future rewards, and Value, which is the expected return, serve as the objective that guides the learning of the policy. In classic RL, return is defined as the exponentially discounted sum of future rewards. One key insight is that...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Wang, Yufei, Ye, Qiwei, Liu, Tie-Yan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!