ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING
Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through a...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | eng ; fre ; ger |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | HADSELL, Raia Thais KAVUKCUOGLU, Koray DENIL, Misha Man Ray BANINO, Andrea SOYER, Hubert Josef MIROWSKI, Piotr Wojciech GOROSHIN, Rostislav SIFRE, Laurent VIOLA, Fabio KUMARAN, Sudarshan PASCANU, Razvan BALLARD, Andrew James |
description | Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a loop closure prediction neural network, an intermediate output generated by the action selection policy neural network to predict whether the agent has returned to a location in the environment that the agent has already visited; and backpropagating a gradient of a loop closure based auxiliary loss into the action selection policy neural network to determine a loop closure based auxiliary update for current values of the network parameters. |
format | Patent |
fullrecord | <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_EP4386624A2</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>EP4386624A2</sourcerecordid><originalsourceid>FETCH-epo_espacenet_EP4386624A23</originalsourceid><addsrcrecordid>eNrjZDB29QvzDPL383X1C1HwcwzzdHcM8fT3UwgN9vRzVwhy9fRz8w9ydgVL-7g6BvkBhXkYWNMSc4pTeaE0N4OCm2uIs4duakF-fGpxQWJyal5qSbxrgImxhZmZkYmjkTERSgCKeCcy</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING</title><source>esp@cenet</source><creator>HADSELL, Raia Thais ; KAVUKCUOGLU, Koray ; DENIL, Misha Man Ray ; BANINO, Andrea ; SOYER, Hubert Josef ; MIROWSKI, Piotr Wojciech ; GOROSHIN, Rostislav ; SIFRE, Laurent ; VIOLA, Fabio ; KUMARAN, Sudarshan ; PASCANU, Razvan ; BALLARD, Andrew James</creator><creatorcontrib>HADSELL, Raia Thais ; KAVUKCUOGLU, Koray ; DENIL, Misha Man Ray ; BANINO, Andrea ; SOYER, Hubert Josef ; MIROWSKI, Piotr Wojciech ; GOROSHIN, Rostislav ; SIFRE, Laurent ; VIOLA, Fabio ; KUMARAN, Sudarshan ; PASCANU, Razvan ; BALLARD, Andrew James</creatorcontrib><description>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a loop closure prediction neural network, an intermediate output generated by the action selection policy neural network to predict whether the agent has returned to a location in the environment that the agent has already visited; and backpropagating a gradient of a loop closure based auxiliary loss into the action selection policy neural network to determine a loop closure based auxiliary update for current values of the network parameters.</description><language>eng ; fre ; ger</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; PHYSICS</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240619&DB=EPODOC&CC=EP&NR=4386624A2$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240619&DB=EPODOC&CC=EP&NR=4386624A2$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>HADSELL, Raia Thais</creatorcontrib><creatorcontrib>KAVUKCUOGLU, Koray</creatorcontrib><creatorcontrib>DENIL, Misha Man Ray</creatorcontrib><creatorcontrib>BANINO, Andrea</creatorcontrib><creatorcontrib>SOYER, Hubert Josef</creatorcontrib><creatorcontrib>MIROWSKI, Piotr Wojciech</creatorcontrib><creatorcontrib>GOROSHIN, Rostislav</creatorcontrib><creatorcontrib>SIFRE, Laurent</creatorcontrib><creatorcontrib>VIOLA, Fabio</creatorcontrib><creatorcontrib>KUMARAN, Sudarshan</creatorcontrib><creatorcontrib>PASCANU, Razvan</creatorcontrib><creatorcontrib>BALLARD, Andrew James</creatorcontrib><title>ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING</title><description>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a loop closure prediction neural network, an intermediate output generated by the action selection policy neural network to predict whether the agent has returned to a location in the environment that the agent has already visited; and backpropagating a gradient of a loop closure based auxiliary loss into the action selection policy neural network to determine a loop closure based auxiliary update for current values of the network parameters.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2024</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZDB29QvzDPL383X1C1HwcwzzdHcM8fT3UwgN9vRzVwhy9fRz8w9ydgVL-7g6BvkBhXkYWNMSc4pTeaE0N4OCm2uIs4duakF-fGpxQWJyal5qSbxrgImxhZmZkYmjkTERSgCKeCcy</recordid><startdate>20240619</startdate><enddate>20240619</enddate><creator>HADSELL, Raia Thais</creator><creator>KAVUKCUOGLU, Koray</creator><creator>DENIL, Misha Man Ray</creator><creator>BANINO, Andrea</creator><creator>SOYER, Hubert Josef</creator><creator>MIROWSKI, Piotr Wojciech</creator><creator>GOROSHIN, Rostislav</creator><creator>SIFRE, Laurent</creator><creator>VIOLA, Fabio</creator><creator>KUMARAN, Sudarshan</creator><creator>PASCANU, Razvan</creator><creator>BALLARD, Andrew James</creator><scope>EVB</scope></search><sort><creationdate>20240619</creationdate><title>ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING</title><author>HADSELL, Raia Thais ; KAVUKCUOGLU, Koray ; DENIL, Misha Man Ray ; BANINO, Andrea ; SOYER, Hubert Josef ; MIROWSKI, Piotr Wojciech ; GOROSHIN, Rostislav ; SIFRE, Laurent ; VIOLA, Fabio ; KUMARAN, Sudarshan ; PASCANU, Razvan ; BALLARD, Andrew James</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_EP4386624A23</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng ; fre ; ger</language><creationdate>2024</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>HADSELL, Raia Thais</creatorcontrib><creatorcontrib>KAVUKCUOGLU, Koray</creatorcontrib><creatorcontrib>DENIL, Misha Man Ray</creatorcontrib><creatorcontrib>BANINO, Andrea</creatorcontrib><creatorcontrib>SOYER, Hubert Josef</creatorcontrib><creatorcontrib>MIROWSKI, Piotr Wojciech</creatorcontrib><creatorcontrib>GOROSHIN, Rostislav</creatorcontrib><creatorcontrib>SIFRE, Laurent</creatorcontrib><creatorcontrib>VIOLA, Fabio</creatorcontrib><creatorcontrib>KUMARAN, Sudarshan</creatorcontrib><creatorcontrib>PASCANU, Razvan</creatorcontrib><creatorcontrib>BALLARD, Andrew James</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>HADSELL, Raia Thais</au><au>KAVUKCUOGLU, Koray</au><au>DENIL, Misha Man Ray</au><au>BANINO, Andrea</au><au>SOYER, Hubert Josef</au><au>MIROWSKI, Piotr Wojciech</au><au>GOROSHIN, Rostislav</au><au>SIFRE, Laurent</au><au>VIOLA, Fabio</au><au>KUMARAN, Sudarshan</au><au>PASCANU, Razvan</au><au>BALLARD, Andrew James</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING</title><date>2024-06-19</date><risdate>2024</risdate><abstract>Methods, systems, and apparatus, including computer programs encoded on a computer storage medium, for training a reinforcement learning system. In one aspect, a method of training an action selection policy neural network for use in selecting actions to be performed by an agent navigating through an environment to accomplish one or more goals comprises: receiving an observation image characterizing a current state of the environment; processing, using the action selection policy neural network, an input comprising the observation image to generate an action selection output; processing, using a loop closure prediction neural network, an intermediate output generated by the action selection policy neural network to predict whether the agent has returned to a location in the environment that the agent has already visited; and backpropagating a gradient of a loop closure based auxiliary loss into the action selection policy neural network to determine a loop closure based auxiliary update for current values of the network parameters.</abstract><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | |
ispartof | |
issn | |
language | eng ; fre ; ger |
recordid | cdi_epo_espacenet_EP4386624A2 |
source | esp@cenet |
subjects | CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS |
title | ENVIRONMENT NAVIGATION USING REINFORCEMENT LEARNING |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T01%3A39%3A35IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=HADSELL,%20Raia%20Thais&rft.date=2024-06-19&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EEP4386624A2%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |