Continuous control with deep reinforcement learning

Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updatin...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Silver, David, Hunt, Jonathan James, Heess, Nicolas Manfred Otto, Erez, Tom, Pritzel, Alexander, Lillicrap, Timothy Paul, Wierstra, Daniel Pieter, Tassa, Yuval
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Silver, David Hunt, Jonathan James Heess, Nicolas Manfred Otto Erez, Tom Pritzel, Alexander Lillicrap, Timothy Paul Wierstra, Daniel Pieter Tassa, Yuval
description	Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_US11803750B2</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>US11803750B2</sourcerecordid><originalsourceid>FETCH-epo_espacenet_US11803750B23</originalsourceid><addsrcrecordid>eNrjZDB2zs8rycwrzS8tVkgGMovycxTKM0syFFJSUwsUilIz89Lyi5JTc1PzShRyUhOL8jLz0nkYWNMSc4pTeaE0N4Oim2uIs4duakF-fGpxQWJyal5qSXxosKGhhYGxuamBk5ExMWoATEYtMw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Continuous control with deep reinforcement learning</title><source>esp@cenet</source><creator>Silver, David ; Hunt, Jonathan James ; Heess, Nicolas Manfred Otto ; Erez, Tom ; Pritzel, Alexander ; Lillicrap, Timothy Paul ; Wierstra, Daniel Pieter ; Tassa, Yuval</creator><creatorcontrib>Silver, David ; Hunt, Jonathan James ; Heess, Nicolas Manfred Otto ; Erez, Tom ; Pritzel, Alexander ; Lillicrap, Timothy Paul ; Wierstra, Daniel Pieter ; Tassa, Yuval</creatorcontrib><description>Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.</description><language>eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; PHYSICS</subject><creationdate>2023</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20231031&DB=EPODOC&CC=US&NR=11803750B2$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25542,76290</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20231031&DB=EPODOC&CC=US&NR=11803750B2$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>Silver, David</creatorcontrib><creatorcontrib>Hunt, Jonathan James</creatorcontrib><creatorcontrib>Heess, Nicolas Manfred Otto</creatorcontrib><creatorcontrib>Erez, Tom</creatorcontrib><creatorcontrib>Pritzel, Alexander</creatorcontrib><creatorcontrib>Lillicrap, Timothy Paul</creatorcontrib><creatorcontrib>Wierstra, Daniel Pieter</creatorcontrib><creatorcontrib>Tassa, Yuval</creatorcontrib><title>Continuous control with deep reinforcement learning</title><description>Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>PHYSICS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2023</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZDB2zs8rycwrzS8tVkgGMovycxTKM0syFFJSUwsUilIz89Lyi5JTc1PzShRyUhOL8jLz0nkYWNMSc4pTeaE0N4Oim2uIs4duakF-fGpxQWJyal5qSXxosKGhhYGxuamBk5ExMWoATEYtMw</recordid><startdate>20231031</startdate><enddate>20231031</enddate><creator>Silver, David</creator><creator>Hunt, Jonathan James</creator><creator>Heess, Nicolas Manfred Otto</creator><creator>Erez, Tom</creator><creator>Pritzel, Alexander</creator><creator>Lillicrap, Timothy Paul</creator><creator>Wierstra, Daniel Pieter</creator><creator>Tassa, Yuval</creator><scope>EVB</scope></search><sort><creationdate>20231031</creationdate><title>Continuous control with deep reinforcement learning</title><author>Silver, David ; Hunt, Jonathan James ; Heess, Nicolas Manfred Otto ; Erez, Tom ; Pritzel, Alexander ; Lillicrap, Timothy Paul ; Wierstra, Daniel Pieter ; Tassa, Yuval</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_US11803750B23</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>eng</language><creationdate>2023</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>PHYSICS</topic><toplevel>online_resources</toplevel><creatorcontrib>Silver, David</creatorcontrib><creatorcontrib>Hunt, Jonathan James</creatorcontrib><creatorcontrib>Heess, Nicolas Manfred Otto</creatorcontrib><creatorcontrib>Erez, Tom</creatorcontrib><creatorcontrib>Pritzel, Alexander</creatorcontrib><creatorcontrib>Lillicrap, Timothy Paul</creatorcontrib><creatorcontrib>Wierstra, Daniel Pieter</creatorcontrib><creatorcontrib>Tassa, Yuval</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Silver, David</au><au>Hunt, Jonathan James</au><au>Heess, Nicolas Manfred Otto</au><au>Erez, Tom</au><au>Pritzel, Alexander</au><au>Lillicrap, Timothy Paul</au><au>Wierstra, Daniel Pieter</au><au>Tassa, Yuval</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Continuous control with deep reinforcement learning</title><date>2023-10-31</date><risdate>2023</risdate><abstract>Methods, systems, and apparatus, including computer programs encoded on computer storage media, for training an actor neural network used to select actions to be performed by an agent interacting with an environment. One of the methods includes obtaining a minibatch of experience tuples; and updating current values of the parameters of the actor neural network, comprising: for each experience tuple in the minibatch: processing the training observation and the training action in the experience tuple using a critic neural network to determine a neural network output for the experience tuple, and determining a target neural network output for the experience tuple; updating current values of the parameters of the critic neural network using errors between the target neural network outputs and the neural network outputs; and updating the current values of the parameters of the actor neural network using the critic neural network.</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	eng
recordid	cdi_epo_espacenet_US11803750B2
source	esp@cenet
subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
title	Continuous control with deep reinforcement learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-01T22%3A31%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=Silver,%20David&rft.date=2023-10-31&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3EUS11803750B2%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true