Environment navigation using reinforcement learning
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through a...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Soyer, Hubert Josef Sifre, Laurent Pascanu, Razvan Banino, Andrea Ballard, Andrew James Viola, Fabio Mirowski, Piotr Wojciech Kumaran, Sudarshan Hadsell, Raia Thais Goroshin, Rostislav Kavukcuoglu, Koray Denil, Misha Man Ray |
description | Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters. |
format | Patent |
fullrecord | <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US11074481B2</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US11074481B2</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US11074481B23</originalsourceid><addsrcrecordid>eNrjZDB2zSvLLMrPy03NK1HISyzLTE8syczPUygtzsxLVyhKzcxLyy9KTgVL56QmFuUBhXkYWNMSc4pTeaE0N4Oim2uIs4duakF-fGpxQWJyal5qSXxosKGhgbmJiYWhk5ExMWoAZgEtbg</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Environment navigation using reinforcement learning</title><source>esp@cenet</source><creator>Soyer, Hubert Josef ; Sifre, Laurent ; Pascanu, Razvan ; Banino, Andrea ; Ballard, Andrew James ; Viola, Fabio ; Mirowski, Piotr Wojciech ; Kumaran, Sudarshan ; Hadsell, Raia Thais ; Goroshin, Rostislav ; Kavukcuoglu, Koray ; Denil, Misha Man Ray</creator><creatorcontrib>Soyer, Hubert Josef ; Sifre, Laurent ; Pascanu, Razvan ; Banino, Andrea ; Ballard, Andrew James ; Viola, Fabio ; Mirowski, Piotr Wojciech ; Kumaran, Sudarshan ; Hadsell, Raia Thais ; Goroshin, Rostislav ; Kavukcuoglu, Koray ; Denil, Misha Man Ray</creatorcontrib><description>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.</description><language>eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; HANDLING RECORD CARRIERS ; IMAGE DATA PROCESSING OR GENERATION, IN GENERAL ; PHYSICS ; PRESENTATION OF DATA ; RECOGNITION OF DATA ; RECORD CARRIERS</subject><creationdate>2021</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210727&DB=EPODOC&CC=US&NR=11074481B2$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20210727&DB=EPODOC&CC=US&NR=11074481B2$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Soyer, Hubert Josef</creatorcontrib><creatorcontrib>Sifre, Laurent</creatorcontrib><creatorcontrib>Pascanu, Razvan</creatorcontrib><creatorcontrib>Banino, Andrea</creatorcontrib><creatorcontrib>Ballard, Andrew James</creatorcontrib><creatorcontrib>Viola, Fabio</creatorcontrib><creatorcontrib>Mirowski, Piotr Wojciech</creatorcontrib><creatorcontrib>Kumaran, Sudarshan</creatorcontrib><creatorcontrib>Hadsell, Raia Thais</creatorcontrib><creatorcontrib>Goroshin, Rostislav</creatorcontrib><creatorcontrib>Kavukcuoglu, Koray</creatorcontrib><creatorcontrib>Denil, Misha Man Ray</creatorcontrib><title>Environment navigation using reinforcement learning</title><description>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>HANDLING RECORD CARRIERS</subject><subject>IMAGE DATA PROCESSING OR GENERATION, IN GENERAL</subject><subject>PHYSICS</subject><subject>PRESENTATION OF DATA</subject><subject>RECOGNITION OF DATA</subject><subject>RECORD CARRIERS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2021</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZDB2zSvLLMrPy03NK1HISyzLTE8syczPUygtzsxLVyhKzcxLyy9KTgVL56QmFuUBhXkYWNMSc4pTeaE0N4Oim2uIs4duakF-fGpxQWJyal5qSXxosKGhgbmJiYWhk5ExMWoAZgEtbg</recordid><startdate>20210727</startdate><enddate>20210727</enddate><creator>Soyer, Hubert Josef</creator><creator>Sifre, Laurent</creator><creator>Pascanu, Razvan</creator><creator>Banino, Andrea</creator><creator>Ballard, Andrew James</creator><creator>Viola, Fabio</creator><creator>Mirowski, Piotr Wojciech</creator><creator>Kumaran, Sudarshan</creator><creator>Hadsell, Raia Thais</creator><creator>Goroshin, Rostislav</creator><creator>Kavukcuoglu, Koray</creator><creator>Denil, Misha Man Ray</creator><scope>EVB</scope></search><sort><creationdate>20210727</creationdate><title>Environment navigation using reinforcement learning</title><author>Soyer, Hubert Josef ; Sifre, Laurent ; Pascanu, Razvan ; Banino, Andrea ; Ballard, Andrew James ; Viola, Fabio ; Mirowski, Piotr Wojciech ; Kumaran, Sudarshan ; Hadsell, Raia Thais ; Goroshin, Rostislav ; Kavukcuoglu, Koray ; Denil, Misha Man Ray</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US11074481B23</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2021</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>HANDLING RECORD CARRIERS</topic><topic>IMAGE DATA PROCESSING OR GENERATION, IN GENERAL</topic><topic>PHYSICS</topic><topic>PRESENTATION OF DATA</topic><topic>RECOGNITION OF DATA</topic><topic>RECORD CARRIERS</topic><toplevel>online_resources</toplevel><creatorcontrib>Soyer, Hubert Josef</creatorcontrib><creatorcontrib>Sifre, Laurent</creatorcontrib><creatorcontrib>Pascanu, Razvan</creatorcontrib><creatorcontrib>Banino, Andrea</creatorcontrib><creatorcontrib>Ballard, Andrew James</creatorcontrib><creatorcontrib>Viola, Fabio</creatorcontrib><creatorcontrib>Mirowski, Piotr Wojciech</creatorcontrib><creatorcontrib>Kumaran, Sudarshan</creatorcontrib><creatorcontrib>Hadsell, Raia Thais</creatorcontrib><creatorcontrib>Goroshin, Rostislav</creatorcontrib><creatorcontrib>Kavukcuoglu, Koray</creatorcontrib><creatorcontrib>Denil, Misha Man Ray</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Soyer, Hubert Josef</au><au>Sifre, Laurent</au><au>Pascanu, Razvan</au><au>Banino, Andrea</au><au>Ballard, Andrew James</au><au>Viola, Fabio</au><au>Mirowski, Piotr Wojciech</au><au>Kumaran, Sudarshan</au><au>Hadsell, Raia Thais</au><au>Goroshin, Rostislav</au><au>Kavukcuoglu, Koray</au><au>Denil, Misha Man Ray</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Environment navigation using reinforcement learning</title><date>2021-07-27</date><risdate>2021</risdate><abstract>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a geometry-prediction neural network, an intermediate output generated by the action selection policy neural network to predict a value of a feature of a geometry of the environment when in the current state; and backpropagating a gradient of a geometry-based auxiliary loss into the action selection policy neural network to determine a geometry-based auxiliary update for current values of the network parameters.</abstract><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | |
ispartof | |
issn | |
language | eng |
recordid | cdi_epo_espacenet_US11074481B2 |
source | esp@cenet |
subjects | CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING HANDLING RECORD CARRIERS IMAGE DATA PROCESSING OR GENERATION, IN GENERAL PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS |
title | Environment navigation using reinforcement learning |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T01%3A58%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Soyer,%20Hubert%20Josef&rft.date=2021-07-27&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS11074481B2%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |