Continuous control with deep reinforcement learning

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updatin...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Silver, David, Hunt, Jonathan James, Heess, Nicolas Manfred Otto, Erez, Tom, Pritzel, Alexander, Lillicrap, Timothy Paul, Wierstra, Daniel Pieter, Tassa, Yuval
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Silver, David
Hunt, Jonathan James
Heess, Nicolas Manfred Otto
Erez, Tom
Pritzel, Alexander
Lillicrap, Timothy Paul
Wierstra, Daniel Pieter
Tassa, Yuval
description Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US11803750B2</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US11803750B2</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US11803750B23</originalsourceid><addsrcrecordid>eNrjZDB2zs8rycwrzS8tVkgGMovycxTKM0syFFJSUwsUilIz89Lyi5JTc1PzShRyUhOL8jLz0nkYWNMSc4pTeaE0N4Oim2uIs4duakF-fGpxQWJyal5qSXxosKGhhYGxuamBk5ExMWoATEYtMw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Continuous control with deep reinforcement learning</title><source>esp@cenet</source><creator>Silver, David ; Hunt, Jonathan James ; Heess, Nicolas Manfred Otto ; Erez, Tom ; Pritzel, Alexander ; Lillicrap, Timothy Paul ; Wierstra, Daniel Pieter ; Tassa, Yuval</creator><creatorcontrib>Silver, David ; Hunt, Jonathan James ; Heess, Nicolas Manfred Otto ; Erez, Tom ; Pritzel, Alexander ; Lillicrap, Timothy Paul ; Wierstra, Daniel Pieter ; Tassa, Yuval</creatorcontrib><description>Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.</description><language>eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; PHYSICS</subject><creationdate>2023</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20231031&amp;DB=EPODOC&amp;CC=US&amp;NR=11803750B2$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76290</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20231031&amp;DB=EPODOC&amp;CC=US&amp;NR=11803750B2$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Silver, David</creatorcontrib><creatorcontrib>Hunt, Jonathan James</creatorcontrib><creatorcontrib>Heess, Nicolas Manfred Otto</creatorcontrib><creatorcontrib>Erez, Tom</creatorcontrib><creatorcontrib>Pritzel, Alexander</creatorcontrib><creatorcontrib>Lillicrap, Timothy Paul</creatorcontrib><creatorcontrib>Wierstra, Daniel Pieter</creatorcontrib><creatorcontrib>Tassa, Yuval</creatorcontrib><title>Continuous control with deep reinforcement learning</title><description>Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2023</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZDB2zs8rycwrzS8tVkgGMovycxTKM0syFFJSUwsUilIz89Lyi5JTc1PzShRyUhOL8jLz0nkYWNMSc4pTeaE0N4Oim2uIs4duakF-fGpxQWJyal5qSXxosKGhhYGxuamBk5ExMWoATEYtMw</recordid><startdate>20231031</startdate><enddate>20231031</enddate><creator>Silver, David</creator><creator>Hunt, Jonathan James</creator><creator>Heess, Nicolas Manfred Otto</creator><creator>Erez, Tom</creator><creator>Pritzel, Alexander</creator><creator>Lillicrap, Timothy Paul</creator><creator>Wierstra, Daniel Pieter</creator><creator>Tassa, Yuval</creator><scope>EVB</scope></search><sort><creationdate>20231031</creationdate><title>Continuous control with deep reinforcement learning</title><author>Silver, David ; Hunt, Jonathan James ; Heess, Nicolas Manfred Otto ; Erez, Tom ; Pritzel, Alexander ; Lillicrap, Timothy Paul ; Wierstra, Daniel Pieter ; Tassa, Yuval</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US11803750B23</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2023</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>Silver, David</creatorcontrib><creatorcontrib>Hunt, Jonathan James</creatorcontrib><creatorcontrib>Heess, Nicolas Manfred Otto</creatorcontrib><creatorcontrib>Erez, Tom</creatorcontrib><creatorcontrib>Pritzel, Alexander</creatorcontrib><creatorcontrib>Lillicrap, Timothy Paul</creatorcontrib><creatorcontrib>Wierstra, Daniel Pieter</creatorcontrib><creatorcontrib>Tassa, Yuval</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Silver, David</au><au>Hunt, Jonathan James</au><au>Heess, Nicolas Manfred Otto</au><au>Erez, Tom</au><au>Pritzel, Alexander</au><au>Lillicrap, Timothy Paul</au><au>Wierstra, Daniel Pieter</au><au>Tassa, Yuval</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Continuous control with deep reinforcement learning</title><date>2023-10-31</date><risdate>2023</risdate><abstract>Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language eng
recordid cdi_epo_espacenet_US11803750B2
source esp@cenet
subjects CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
PHYSICS
title Continuous control with deep reinforcement learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T22%3A31%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Silver,%20David&rft.date=2023-10-31&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS11803750B2%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true