REINFORCEMENT LEARNING WITH ADAPTIVE RETURN COMPUTATION SCHEMES

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning with adaptive return computation schemes. In one aspect, a method includes: maintaining data specifying a policy for selecting between multiple different return computation s...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Sprechmann, Pablo, Vitvitskyi, Alex, Guo, Zhaohan, Piot, Bilal, Badia, Adrià Puigdomènech, Kapturowski, Steven James, Blundell, Charles
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for reinforcement learning with adaptive return computation schemes. In one aspect, a method includes: maintaining data specifying a policy for selecting between multiple different return computation schemes, each return computation scheme assigning a different importance to exploring the environment while performing an episode of a task; selecting, using the policy, a return computation scheme from the multiple different return computation schemes; controlling an agent to perform the episode of the task to maximize a return computed according to the selected return computation scheme; identifying rewards that were generated as a result of the agent performing the episode of the task; and updating, using the identified rewards, the policy for selecting between multiple different return computation schemes.