TRAINING REINFORCEMENT LEARNING NEURAL NETWORKS

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a Q network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a plurality of experience tuples and training the Q netwo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	VAN HASSELT, Hado Philip, GUEZ, Arthur Clement
Format:	Patent
Sprache:	eng ; fre
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training a Q network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a plurality of experience tuples and training the Q network on each of the experience tuples using the Q network and a target Q network that is identical to the Q network but with the current values of the parameters of the target Q network being different from the current values of the parameters of the Q network. L'invention concerne des procédés, des systèmes et un appareil, y compris des programmes informatiques codés sur un support d'informations informatique, associés à un entraînement d'un réseau Q utilisé pour sélectionner des actions devant être effectuées par un agent interagissant avec un environnement. L'un des procédés consiste à obtenir une pluralité de tuples d'expérience et à entraîner le réseau Q par rapport à chacun des tuples d'expérience au moyen du réseau Q et d'un réseau Q cible qui est identique au réseau Q, mais pour lequel les valeurs courantes des paramètres du réseau Q cible sont différentes des valeurs courantes des paramètres du réseau Q.