Ray-based distributed reinforcement learning method and device

The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling no...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	FAN SONGYUAN, YU JIN, ZHAN GUANG, SUN ZHIXIAO, PIAO HAIYIN, HAN YUE, LANG KUIJUN, SUN YANG, PENG XUANQI, YANG SHENGQI
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING HANDLING RECORD CARRIERS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	FAN SONGYUAN YU JIN ZHAN GUANG SUN ZHIXIAO PIAO HAIYIN HAN YUE LANG KUIJUN SUN YANG PENG XUANQI YANG SHENGQI
description	The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling node, and storing the training data in a buffer pool; S2, periodically polling the training data of the buffer pool, and after the sum of the training data meets the quantity requirement, notifying and waiting for all sampling nodes to end sampling; S3, obtaining model parameters, training the model based on the training data, and returning the trained model parameters; and S4, emptying the data of the buffer pool, and repeating the reinforcement learning process of sampling and training. The training effect of the reinforcement learning algorithm is effectively improved, and the training time is shortened. 本申请属于强化学习技术领域，具体涉及一种基于Ray的分布式强化学习方法及装置。该方法包括步骤S1、接收设置在各采样节点的远端采样函数发送来的训练数据，并存储于缓冲池；步骤S2、定期轮询缓冲池的训练数据，待训练数据之和满足数
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN113920388A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN113920388A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN113920388A3</originalsourceid><addsrcrecordid>eNrjZLALSqzUTUosTk1RSMksLinKTCotAbKLUjPz0vKLklNzU_NKFHJSE4vyMvPSFXJTSzLyUxQS84CqU8syk1N5GFjTEnOKU3mhNDeDoptriLOHbmpBfnxqcUFicmpeakm8s5-hobGlkYGxhYWjMTFqAHfbMMs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Ray-based distributed reinforcement learning method and device</title><source>esp@cenet</source><creator>FAN SONGYUAN ; YU JIN ; ZHAN GUANG ; SUN ZHIXIAO ; PIAO HAIYIN ; HAN YUE ; LANG KUIJUN ; SUN YANG ; PENG XUANQI ; YANG SHENGQI</creator><creatorcontrib>FAN SONGYUAN ; YU JIN ; ZHAN GUANG ; SUN ZHIXIAO ; PIAO HAIYIN ; HAN YUE ; LANG KUIJUN ; SUN YANG ; PENG XUANQI ; YANG SHENGQI</creatorcontrib><description>The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling node, and storing the training data in a buffer pool; S2, periodically polling the training data of the buffer pool, and after the sum of the training data meets the quantity requirement, notifying and waiting for all sampling nodes to end sampling; S3, obtaining model parameters, training the model based on the training data, and returning the trained model parameters; and S4, emptying the data of the buffer pool, and repeating the reinforcement learning process of sampling and training. The training effect of the reinforcement learning algorithm is effectively improved, and the training time is shortened. 本申请属于强化学习技术领域，具体涉及一种基于Ray的分布式强化学习方法及装置。该方法包括步骤S1、接收设置在各采样节点的远端采样函数发送来的训练数据，并存储于缓冲池；步骤S2、定期轮询缓冲池的训练数据，待训练数据之和满足数</description><language>chi ; eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; HANDLING RECORD CARRIERS ; PHYSICS ; PRESENTATION OF DATA ; RECOGNITION OF DATA ; RECORD CARRIERS</subject><creationdate>2022</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20220111&DB=EPODOC&CC=CN&NR=113920388A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25543,76293</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20220111&DB=EPODOC&CC=CN&NR=113920388A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>FAN SONGYUAN</creatorcontrib><creatorcontrib>YU JIN</creatorcontrib><creatorcontrib>ZHAN GUANG</creatorcontrib><creatorcontrib>SUN ZHIXIAO</creatorcontrib><creatorcontrib>PIAO HAIYIN</creatorcontrib><creatorcontrib>HAN YUE</creatorcontrib><creatorcontrib>LANG KUIJUN</creatorcontrib><creatorcontrib>SUN YANG</creatorcontrib><creatorcontrib>PENG XUANQI</creatorcontrib><creatorcontrib>YANG SHENGQI</creatorcontrib><title>Ray-based distributed reinforcement learning method and device</title><description>The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling node, and storing the training data in a buffer pool; S2, periodically polling the training data of the buffer pool, and after the sum of the training data meets the quantity requirement, notifying and waiting for all sampling nodes to end sampling; S3, obtaining model parameters, training the model based on the training data, and returning the trained model parameters; and S4, emptying the data of the buffer pool, and repeating the reinforcement learning process of sampling and training. The training effect of the reinforcement learning algorithm is effectively improved, and the training time is shortened. 本申请属于强化学习技术领域，具体涉及一种基于Ray的分布式强化学习方法及装置。该方法包括步骤S1、接收设置在各采样节点的远端采样函数发送来的训练数据，并存储于缓冲池；步骤S2、定期轮询缓冲池的训练数据，待训练数据之和满足数</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>HANDLING RECORD CARRIERS</subject><subject>PHYSICS</subject><subject>PRESENTATION OF DATA</subject><subject>RECOGNITION OF DATA</subject><subject>RECORD CARRIERS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2022</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZLALSqzUTUosTk1RSMksLinKTCotAbKLUjPz0vKLklNzU_NKFHJSE4vyMvPSFXJTSzLyUxQS84CqU8syk1N5GFjTEnOKU3mhNDeDoptriLOHbmpBfnxqcUFicmpeakm8s5-hobGlkYGxhYWjMTFqAHfbMMs</recordid><startdate>20220111</startdate><enddate>20220111</enddate><creator>FAN SONGYUAN</creator><creator>YU JIN</creator><creator>ZHAN GUANG</creator><creator>SUN ZHIXIAO</creator><creator>PIAO HAIYIN</creator><creator>HAN YUE</creator><creator>LANG KUIJUN</creator><creator>SUN YANG</creator><creator>PENG XUANQI</creator><creator>YANG SHENGQI</creator><scope>EVB</scope></search><sort><creationdate>20220111</creationdate><title>Ray-based distributed reinforcement learning method and device</title><author>FAN SONGYUAN ; YU JIN ; ZHAN GUANG ; SUN ZHIXIAO ; PIAO HAIYIN ; HAN YUE ; LANG KUIJUN ; SUN YANG ; PENG XUANQI ; YANG SHENGQI</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN113920388A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2022</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>HANDLING RECORD CARRIERS</topic><topic>PHYSICS</topic><topic>PRESENTATION OF DATA</topic><topic>RECOGNITION OF DATA</topic><topic>RECORD CARRIERS</topic><toplevel>online_resources</toplevel><creatorcontrib>FAN SONGYUAN</creatorcontrib><creatorcontrib>YU JIN</creatorcontrib><creatorcontrib>ZHAN GUANG</creatorcontrib><creatorcontrib>SUN ZHIXIAO</creatorcontrib><creatorcontrib>PIAO HAIYIN</creatorcontrib><creatorcontrib>HAN YUE</creatorcontrib><creatorcontrib>LANG KUIJUN</creatorcontrib><creatorcontrib>SUN YANG</creatorcontrib><creatorcontrib>PENG XUANQI</creatorcontrib><creatorcontrib>YANG SHENGQI</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>FAN SONGYUAN</au><au>YU JIN</au><au>ZHAN GUANG</au><au>SUN ZHIXIAO</au><au>PIAO HAIYIN</au><au>HAN YUE</au><au>LANG KUIJUN</au><au>SUN YANG</au><au>PENG XUANQI</au><au>YANG SHENGQI</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Ray-based distributed reinforcement learning method and device</title><date>2022-01-11</date><risdate>2022</risdate><abstract>The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling node, and storing the training data in a buffer pool; S2, periodically polling the training data of the buffer pool, and after the sum of the training data meets the quantity requirement, notifying and waiting for all sampling nodes to end sampling; S3, obtaining model parameters, training the model based on the training data, and returning the trained model parameters; and S4, emptying the data of the buffer pool, and repeating the reinforcement learning process of sampling and training. The training effect of the reinforcement learning algorithm is effectively improved, and the training time is shortened. 本申请属于强化学习技术领域，具体涉及一种基于Ray的分布式强化学习方法及装置。该方法包括步骤S1、接收设置在各采样节点的远端采样函数发送来的训练数据，并存储于缓冲池；步骤S2、定期轮询缓冲池的训练数据，待训练数据之和满足数</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	chi ; eng
recordid	cdi_epo_espacenet_CN113920388A
source	esp@cenet
subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING HANDLING RECORD CARRIERS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
title	Ray-based distributed reinforcement learning method and device
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T03%3A25%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=FAN%20SONGYUAN&rft.date=2022-01-11&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN113920388A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true