Ray-based distributed reinforcement learning method and device

The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling no...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: FAN SONGYUAN, YU JIN, ZHAN GUANG, SUN ZHIXIAO, PIAO HAIYIN, HAN YUE, LANG KUIJUN, SUN YANG, PENG XUANQI, YANG SHENGQI
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator FAN SONGYUAN
YU JIN
ZHAN GUANG
SUN ZHIXIAO
PIAO HAIYIN
HAN YUE
LANG KUIJUN
SUN YANG
PENG XUANQI
YANG SHENGQI
description The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling node, and storing the training data in a buffer pool; S2, periodically polling the training data of the buffer pool, and after the sum of the training data meets the quantity requirement, notifying and waiting for all sampling nodes to end sampling; S3, obtaining model parameters, training the model based on the training data, and returning the trained model parameters; and S4, emptying the data of the buffer pool, and repeating the reinforcement learning process of sampling and training. The training effect of the reinforcement learning algorithm is effectively improved, and the training time is shortened. 本申请属于强化学习技术领域,具体涉及一种基于Ray的分布式强化学习方法及装置。该方法包括步骤S1、接收设置在各采样节点的远端采样函数发送来的训练数据,并存储于缓冲池;步骤S2、定期轮询缓冲池的训练数据,待训练数据之和满足数
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN113920388A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN113920388A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN113920388A3</originalsourceid><addsrcrecordid>eNrjZLALSqzUTUosTk1RSMksLinKTCotAbKLUjPz0vKLklNzU_NKFHJSE4vyMvPSFXJTSzLyUxQS84CqU8syk1N5GFjTEnOKU3mhNDeDoptriLOHbmpBfnxqcUFicmpeakm8s5-hobGlkYGxhYWjMTFqAHfbMMs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Ray-based distributed reinforcement learning method and device</title><source>esp@cenet</source><creator>FAN SONGYUAN ; YU JIN ; ZHAN GUANG ; SUN ZHIXIAO ; PIAO HAIYIN ; HAN YUE ; LANG KUIJUN ; SUN YANG ; PENG XUANQI ; YANG SHENGQI</creator><creatorcontrib>FAN SONGYUAN ; YU JIN ; ZHAN GUANG ; SUN ZHIXIAO ; PIAO HAIYIN ; HAN YUE ; LANG KUIJUN ; SUN YANG ; PENG XUANQI ; YANG SHENGQI</creatorcontrib><description>The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling node, and storing the training data in a buffer pool; S2, periodically polling the training data of the buffer pool, and after the sum of the training data meets the quantity requirement, notifying and waiting for all sampling nodes to end sampling; S3, obtaining model parameters, training the model based on the training data, and returning the trained model parameters; and S4, emptying the data of the buffer pool, and repeating the reinforcement learning process of sampling and training. The training effect of the reinforcement learning algorithm is effectively improved, and the training time is shortened. 本申请属于强化学习技术领域,具体涉及一种基于Ray的分布式强化学习方法及装置。该方法包括步骤S1、接收设置在各采样节点的远端采样函数发送来的训练数据,并存储于缓冲池;步骤S2、定期轮询缓冲池的训练数据,待训练数据之和满足数</description><language>chi ; eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; HANDLING RECORD CARRIERS ; PHYSICS ; PRESENTATION OF DATA ; RECOGNITION OF DATA ; RECORD CARRIERS</subject><creationdate>2022</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20220111&amp;DB=EPODOC&amp;CC=CN&amp;NR=113920388A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25543,76293</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20220111&amp;DB=EPODOC&amp;CC=CN&amp;NR=113920388A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>FAN SONGYUAN</creatorcontrib><creatorcontrib>YU JIN</creatorcontrib><creatorcontrib>ZHAN GUANG</creatorcontrib><creatorcontrib>SUN ZHIXIAO</creatorcontrib><creatorcontrib>PIAO HAIYIN</creatorcontrib><creatorcontrib>HAN YUE</creatorcontrib><creatorcontrib>LANG KUIJUN</creatorcontrib><creatorcontrib>SUN YANG</creatorcontrib><creatorcontrib>PENG XUANQI</creatorcontrib><creatorcontrib>YANG SHENGQI</creatorcontrib><title>Ray-based distributed reinforcement learning method and device</title><description>The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling node, and storing the training data in a buffer pool; S2, periodically polling the training data of the buffer pool, and after the sum of the training data meets the quantity requirement, notifying and waiting for all sampling nodes to end sampling; S3, obtaining model parameters, training the model based on the training data, and returning the trained model parameters; and S4, emptying the data of the buffer pool, and repeating the reinforcement learning process of sampling and training. The training effect of the reinforcement learning algorithm is effectively improved, and the training time is shortened. 本申请属于强化学习技术领域,具体涉及一种基于Ray的分布式强化学习方法及装置。该方法包括步骤S1、接收设置在各采样节点的远端采样函数发送来的训练数据,并存储于缓冲池;步骤S2、定期轮询缓冲池的训练数据,待训练数据之和满足数</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>HANDLING RECORD CARRIERS</subject><subject>PHYSICS</subject><subject>PRESENTATION OF DATA</subject><subject>RECOGNITION OF DATA</subject><subject>RECORD CARRIERS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2022</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZLALSqzUTUosTk1RSMksLinKTCotAbKLUjPz0vKLklNzU_NKFHJSE4vyMvPSFXJTSzLyUxQS84CqU8syk1N5GFjTEnOKU3mhNDeDoptriLOHbmpBfnxqcUFicmpeakm8s5-hobGlkYGxhYWjMTFqAHfbMMs</recordid><startdate>20220111</startdate><enddate>20220111</enddate><creator>FAN SONGYUAN</creator><creator>YU JIN</creator><creator>ZHAN GUANG</creator><creator>SUN ZHIXIAO</creator><creator>PIAO HAIYIN</creator><creator>HAN YUE</creator><creator>LANG KUIJUN</creator><creator>SUN YANG</creator><creator>PENG XUANQI</creator><creator>YANG SHENGQI</creator><scope>EVB</scope></search><sort><creationdate>20220111</creationdate><title>Ray-based distributed reinforcement learning method and device</title><author>FAN SONGYUAN ; YU JIN ; ZHAN GUANG ; SUN ZHIXIAO ; PIAO HAIYIN ; HAN YUE ; LANG KUIJUN ; SUN YANG ; PENG XUANQI ; YANG SHENGQI</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN113920388A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2022</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>HANDLING RECORD CARRIERS</topic><topic>PHYSICS</topic><topic>PRESENTATION OF DATA</topic><topic>RECOGNITION OF DATA</topic><topic>RECORD CARRIERS</topic><toplevel>online_resources</toplevel><creatorcontrib>FAN SONGYUAN</creatorcontrib><creatorcontrib>YU JIN</creatorcontrib><creatorcontrib>ZHAN GUANG</creatorcontrib><creatorcontrib>SUN ZHIXIAO</creatorcontrib><creatorcontrib>PIAO HAIYIN</creatorcontrib><creatorcontrib>HAN YUE</creatorcontrib><creatorcontrib>LANG KUIJUN</creatorcontrib><creatorcontrib>SUN YANG</creatorcontrib><creatorcontrib>PENG XUANQI</creatorcontrib><creatorcontrib>YANG SHENGQI</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>FAN SONGYUAN</au><au>YU JIN</au><au>ZHAN GUANG</au><au>SUN ZHIXIAO</au><au>PIAO HAIYIN</au><au>HAN YUE</au><au>LANG KUIJUN</au><au>SUN YANG</au><au>PENG XUANQI</au><au>YANG SHENGQI</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Ray-based distributed reinforcement learning method and device</title><date>2022-01-11</date><risdate>2022</risdate><abstract>The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling node, and storing the training data in a buffer pool; S2, periodically polling the training data of the buffer pool, and after the sum of the training data meets the quantity requirement, notifying and waiting for all sampling nodes to end sampling; S3, obtaining model parameters, training the model based on the training data, and returning the trained model parameters; and S4, emptying the data of the buffer pool, and repeating the reinforcement learning process of sampling and training. The training effect of the reinforcement learning algorithm is effectively improved, and the training time is shortened. 本申请属于强化学习技术领域,具体涉及一种基于Ray的分布式强化学习方法及装置。该方法包括步骤S1、接收设置在各采样节点的远端采样函数发送来的训练数据,并存储于缓冲池;步骤S2、定期轮询缓冲池的训练数据,待训练数据之和满足数</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language chi ; eng
recordid cdi_epo_espacenet_CN113920388A
source esp@cenet
subjects CALCULATING
COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS
COMPUTING
COUNTING
HANDLING RECORD CARRIERS
PHYSICS
PRESENTATION OF DATA
RECOGNITION OF DATA
RECORD CARRIERS
title Ray-based distributed reinforcement learning method and device
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T03%3A25%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=FAN%20SONGYUAN&rft.date=2022-01-11&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN113920388A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true