NEURAL EPISODIC CONTROL

A method includes maintaining respective episodic memory data for each of multiple actions; receiving a current observation characterizing a current state of an environment being interacted with by an agent; processing the current observation using an embedding neural network in accordance with curr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	PRITZEL, Alexander, BADIA, Adria Puigdomenech, URIA-MARTÍNEZ, Benigno, BLUNDELL, Charles
Format:	Patent
Sprache:	eng ; fre
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A method includes maintaining respective episodic memory data for each of multiple actions; receiving a current observation characterizing a current state of an environment being interacted with by an agent; processing the current observation using an embedding neural network in accordance with current values of parameters of the embedding neural network to generate a current key embedding for the current observation; for each action of the plurality of actions: determining the p nearest key embeddings in the episodic memory data for the action to the current key embedding according to a distance measure, and determining a Q value for the action from the return estimates mapped to by the p nearest key embeddings in the episodic memory data for the action; and selecting, using the Q values for the actions, an action from the multiple actions as the action to be performed by the agent. Selon la présente invention, un procédé consiste à : maintenir des données de mémoire épisodique respectives pour chaque action parmi de multiples actions ; recevoir une observation actuelle caractérisant un état actuel d'un environnement soumis à une interaction par un agent ; traiter l'observation actuelle à l'aide d'un réseau neuronal d'intégration selon des valeurs actuelles de paramètres du réseau neuronal d'intégration pour générer une intégration clé actuelle pour l'observation actuelle ; pour chaque action de la pluralité des actions : déterminer les p intégrations clés les plus proches, dans les données de mémoire épisodique pour l'action, à l'intégration clé actuelle selon une mesure de distance, et déterminer une valeur Q pour l'action à partir d'estimations renvoyées associées par les p intégrations clés les plus proches dans les données de mémoire épisodique pour l'action ; et sélectionner, à l'aide des valeurs Q pour les actions, une action à partir des multiples actions en tant qu'action à réaliser par l'agent.