DISTRIBUTED TRAINING USING OFF-POLICY ACTOR-CRITIC REINFORCEMENT LEARNING

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing unit...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	KAVUKCUOGLU, Koray, HARLEY, Timothy James Alexander, MUNOS, Remi, SOYER, Hubert Josef, DORON, Yotam, FIROIU, Vlad, ESPEHOLT, Lasse, MNIH, Volodymyr, DUNNING, Iain, WARD, Thomas, SIMONYAN, Karen
Format:	Patent
Sprache:	eng ; fre ; ger
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training an action selection neural network used to select actions to be performed by an agent interacting with an environment. In one aspect, a system comprises a plurality of actor computing units and a plurality of learner computing units. The actor computing units generate experience tuple trajectories that are used by the learner computing units to update learner action selection neural network parameters using a reinforcement learning technique. The reinforcement learning technique may be an off-policy actor critic reinforcement learning technique.