ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING

Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	HADSELL, Raia Thais, KAVUKCUOGLU, Koray, DENIL, Misha Man Ray, BANINO, Andrea, SOYER, Hubert Josef, MIROWSKI, Piotr Wojciech, GOROSHIN, Rostislav, SIFRE, Laurent, VIOLA, Fabio, KUMARAN, Sudarshan, PASCANU, Razvan, BALLARD, Andrew James
Format:	Patent
Sprache:	eng ; fre
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	HADSELL, Raia Thais KAVUKCUOGLU, Koray DENIL, Misha Man Ray BANINO, Andrea SOYER, Hubert Josef MIROWSKI, Piotr Wojciech GOROSHIN, Rostislav SIFRE, Laurent VIOLA, Fabio KUMARAN, Sudarshan PASCANU, Razvan BALLARD, Andrew James
description	Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters. L'invention concerne des procédés, des systèmes et un appareil, y compris des programmes informatiques codés sur un support d'enregistrement informatique, pour former un système apprentissage par renforcement. Selon un aspect, un procédé de formation d'un réseau neuronal de politique de sélection d'action destiné à être utilisé dans la sélection d'actions à exécuter par un agent naviguant dans un environnement pour accomplir un ou plusieurs objectifs comprend: recevoir une image d'observation caractérisant un état actuel de l'environnement; traiter, à l'aide du réseau neuronal de politique de sélection d'action, une entrée comprenant l'image d'observation pour générer une sortie de sélection d'action; traiter, à l'aide d'un réseau neuronal de prédiction de géométrie, une sortie intermédiaire générée par le réseau neuronal de politique de sélection d'action pour prédire une valeur d'une caractéristique d'une caractéristique d'une géométrie de l'environnement lorsqu'il est dans l'état actuel; et la rétropropagation d'un gradient d'une perte auxiliaire basée sur la géométrie dans le réseau neuronal de politique de sélection d'action pour déterminer une mise à jour auxiliaire basée sur la géométrie pour des valeurs actuelles des paramètres de réseau.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_WO2018083672A1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>WO2018083672A1</sourcerecordid><originalsourceid>FETCH-epo_espacenet_WO2018083672A13</originalsourceid><addsrcrecordid>eNrjZDB29QvzDPL383X1C1HwcwzzdHcM8fT3UwgN9vRzVwhy9fRz8w9ydgVL-7g6BvkBhXkYWNMSc4pTeaE0N4Oym2uIs4duakF-fGpxQWJyal5qSXy4v5GBoYWBhbGZuZGjoTFxqgB0KyiA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING</title><source>esp@cenet</source><creator>HADSELL, Raia Thais ; KAVUKCUOGLU, Koray ; DENIL, Misha Man Ray ; BANINO, Andrea ; SOYER, Hubert Josef ; MIROWSKI, Piotr Wojciech ; GOROSHIN, Rostislav ; SIFRE, Laurent ; VIOLA, Fabio ; KUMARAN, Sudarshan ; PASCANU, Razvan ; BALLARD, Andrew James</creator><creatorcontrib>HADSELL, Raia Thais ; KAVUKCUOGLU, Koray ; DENIL, Misha Man Ray ; BANINO, Andrea ; SOYER, Hubert Josef ; MIROWSKI, Piotr Wojciech ; GOROSHIN, Rostislav ; SIFRE, Laurent ; VIOLA, Fabio ; KUMARAN, Sudarshan ; PASCANU, Razvan ; BALLARD, Andrew James</creatorcontrib><description>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters. L'invention concerne des procédés, des systèmes et un appareil, y compris des programmes informatiques codés sur un support d'enregistrement informatique, pour former un système apprentissage par renforcement. Selon un aspect, un procédé de formation d'un réseau neuronal de politique de sélection d'action destiné à être utilisé dans la sélection d'actions à exécuter par un agent naviguant dans un environnement pour accomplir un ou plusieurs objectifs comprend: recevoir une image d'observation caractérisant un état actuel de l'environnement; traiter, à l'aide du réseau neuronal de politique de sélection d'action, une entrée comprenant l'image d'observation pour générer une sortie de sélection d'action; traiter, à l'aide d'un réseau neuronal de prédiction de géométrie, une sortie intermédiaire générée par le réseau neuronal de politique de sélection d'action pour prédire une valeur d'une caractéristique d'une caractéristique d'une géométrie de l'environnement lorsqu'il est dans l'état actuel; et la rétropropagation d'un gradient d'une perte auxiliaire basée sur la géométrie dans le réseau neuronal de politique de sélection d'action pour déterminer une mise à jour auxiliaire basée sur la géométrie pour des valeurs actuelles des paramètres de réseau.</description><language>eng ; fre</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; PHYSICS</subject><creationdate>2018</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20180511&DB=EPODOC&CC=WO&NR=2018083672A1$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76290</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20180511&DB=EPODOC&CC=WO&NR=2018083672A1$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>HADSELL, Raia Thais</creatorcontrib><creatorcontrib>KAVUKCUOGLU, Koray</creatorcontrib><creatorcontrib>DENIL, Misha Man Ray</creatorcontrib><creatorcontrib>BANINO, Andrea</creatorcontrib><creatorcontrib>SOYER, Hubert Josef</creatorcontrib><creatorcontrib>MIROWSKI, Piotr Wojciech</creatorcontrib><creatorcontrib>GOROSHIN, Rostislav</creatorcontrib><creatorcontrib>SIFRE, Laurent</creatorcontrib><creatorcontrib>VIOLA, Fabio</creatorcontrib><creatorcontrib>KUMARAN, Sudarshan</creatorcontrib><creatorcontrib>PASCANU, Razvan</creatorcontrib><creatorcontrib>BALLARD, Andrew James</creatorcontrib><title>ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING</title><description>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters. L'invention concerne des procédés, des systèmes et un appareil, y compris des programmes informatiques codés sur un support d'enregistrement informatique, pour former un système apprentissage par renforcement. Selon un aspect, un procédé de formation d'un réseau neuronal de politique de sélection d'action destiné à être utilisé dans la sélection d'actions à exécuter par un agent naviguant dans un environnement pour accomplir un ou plusieurs objectifs comprend: recevoir une image d'observation caractérisant un état actuel de l'environnement; traiter, à l'aide du réseau neuronal de politique de sélection d'action, une entrée comprenant l'image d'observation pour générer une sortie de sélection d'action; traiter, à l'aide d'un réseau neuronal de prédiction de géométrie, une sortie intermédiaire générée par le réseau neuronal de politique de sélection d'action pour prédire une valeur d'une caractéristique d'une caractéristique d'une géométrie de l'environnement lorsqu'il est dans l'état actuel; et la rétropropagation d'un gradient d'une perte auxiliaire basée sur la géométrie dans le réseau neuronal de politique de sélection d'action pour déterminer une mise à jour auxiliaire basée sur la géométrie pour des valeurs actuelles des paramètres de réseau.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2018</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZDB29QvzDPL383X1C1HwcwzzdHcM8fT3UwgN9vRzVwhy9fRz8w9ydgVL-7g6BvkBhXkYWNMSc4pTeaE0N4Oym2uIs4duakF-fGpxQWJyal5qSXy4v5GBoYWBhbGZuZGjoTFxqgB0KyiA</recordid><startdate>20180511</startdate><enddate>20180511</enddate><creator>HADSELL, Raia Thais</creator><creator>KAVUKCUOGLU, Koray</creator><creator>DENIL, Misha Man Ray</creator><creator>BANINO, Andrea</creator><creator>SOYER, Hubert Josef</creator><creator>MIROWSKI, Piotr Wojciech</creator><creator>GOROSHIN, Rostislav</creator><creator>SIFRE, Laurent</creator><creator>VIOLA, Fabio</creator><creator>KUMARAN, Sudarshan</creator><creator>PASCANU, Razvan</creator><creator>BALLARD, Andrew James</creator><scope>EVB</scope></search><sort><creationdate>20180511</creationdate><title>ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING</title><author>HADSELL, Raia Thais ; KAVUKCUOGLU, Koray ; DENIL, Misha Man Ray ; BANINO, Andrea ; SOYER, Hubert Josef ; MIROWSKI, Piotr Wojciech ; GOROSHIN, Rostislav ; SIFRE, Laurent ; VIOLA, Fabio ; KUMARAN, Sudarshan ; PASCANU, Razvan ; BALLARD, Andrew James</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_WO2018083672A13</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng ; fre</language><creationdate>2018</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>HADSELL, Raia Thais</creatorcontrib><creatorcontrib>KAVUKCUOGLU, Koray</creatorcontrib><creatorcontrib>DENIL, Misha Man Ray</creatorcontrib><creatorcontrib>BANINO, Andrea</creatorcontrib><creatorcontrib>SOYER, Hubert Josef</creatorcontrib><creatorcontrib>MIROWSKI, Piotr Wojciech</creatorcontrib><creatorcontrib>GOROSHIN, Rostislav</creatorcontrib><creatorcontrib>SIFRE, Laurent</creatorcontrib><creatorcontrib>VIOLA, Fabio</creatorcontrib><creatorcontrib>KUMARAN, Sudarshan</creatorcontrib><creatorcontrib>PASCANU, Razvan</creatorcontrib><creatorcontrib>BALLARD, Andrew James</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>HADSELL, Raia Thais</au><au>KAVUKCUOGLU, Koray</au><au>DENIL, Misha Man Ray</au><au>BANINO, Andrea</au><au>SOYER, Hubert Josef</au><au>MIROWSKI, Piotr Wojciech</au><au>GOROSHIN, Rostislav</au><au>SIFRE, Laurent</au><au>VIOLA, Fabio</au><au>KUMARAN, Sudarshan</au><au>PASCANU, Razvan</au><au>BALLARD, Andrew James</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING</title><date>2018-05-11</date><risdate>2018</risdate><abstract>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters. L'invention concerne des procédés, des systèmes et un appareil, y compris des programmes informatiques codés sur un support d'enregistrement informatique, pour former un système apprentissage par renforcement. Selon un aspect, un procédé de formation d'un réseau neuronal de politique de sélection d'action destiné à être utilisé dans la sélection d'actions à exécuter par un agent naviguant dans un environnement pour accomplir un ou plusieurs objectifs comprend: recevoir une image d'observation caractérisant un état actuel de l'environnement; traiter, à l'aide du réseau neuronal de politique de sélection d'action, une entrée comprenant l'image d'observation pour générer une sortie de sélection d'action; traiter, à l'aide d'un réseau neuronal de prédiction de géométrie, une sortie intermédiaire générée par le réseau neuronal de politique de sélection d'action pour prédire une valeur d'une caractéristique d'une caractéristique d'une géométrie de l'environnement lorsqu'il est dans l'état actuel; et la rétropropagation d'un gradient d'une perte auxiliaire basée sur la géométrie dans le réseau neuronal de politique de sélection d'action pour déterminer une mise à jour auxiliaire basée sur la géométrie pour des valeurs actuelles des paramètres de réseau.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng ; fre
recordid	cdi_epo_espacenet_WO2018083672A1
source	esp@cenet
subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
title	ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T08%3A10%3A52IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=HADSELL,%20Raia%20Thais&rft.date=2018-05-11&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EWO2018083672A1%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true