Ray-based distributed reinforcement learning method and device
The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling no...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | FAN SONGYUAN YU JIN ZHAN GUANG SUN ZHIXIAO PIAO HAIYIN HAN YUE LANG KUIJUN SUN YANG PENG XUANQI YANG SHENGQI |
description | The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling node, and storing the training data in a buffer pool; S2, periodically polling the training data of the buffer pool, and after the sum of the training data meets the quantity requirement, notifying and waiting for all sampling nodes to end sampling; S3, obtaining model parameters, training the model based on the training data, and returning the trained model parameters; and S4, emptying the data of the buffer pool, and repeating the reinforcement learning process of sampling and training. The training effect of the reinforcement learning algorithm is effectively improved, and the training time is shortened.
本申请属于强化学习技术领域,具体涉及一种基于Ray的分布式强化学习方法及装置。该方法包括步骤S1、接收设置在各采样节点的远端采样函数发送来的训练数据,并存储于缓冲池;步骤S2、定期轮询缓冲池的训练数据,待训练数据之和满足数 |
format | Patent |
fullrecord | <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN113920388A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN113920388A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN113920388A3</originalsourceid><addsrcrecordid>eNrjZLALSqzUTUosTk1RSMksLinKTCotAbKLUjPz0vKLklNzU_NKFHJSE4vyMvPSFXJTSzLyUxQS84CqU8syk1N5GFjTEnOKU3mhNDeDoptriLOHbmpBfnxqcUFicmpeakm8s5-hobGlkYGxhYWjMTFqAHfbMMs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Ray-based distributed reinforcement learning method and device</title><source>esp@cenet</source><creator>FAN SONGYUAN ; YU JIN ; ZHAN GUANG ; SUN ZHIXIAO ; PIAO HAIYIN ; HAN YUE ; LANG KUIJUN ; SUN YANG ; PENG XUANQI ; YANG SHENGQI</creator><creatorcontrib>FAN SONGYUAN ; YU JIN ; ZHAN GUANG ; SUN ZHIXIAO ; PIAO HAIYIN ; HAN YUE ; LANG KUIJUN ; SUN YANG ; PENG XUANQI ; YANG SHENGQI</creatorcontrib><description>The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling node, and storing the training data in a buffer pool; S2, periodically polling the training data of the buffer pool, and after the sum of the training data meets the quantity requirement, notifying and waiting for all sampling nodes to end sampling; S3, obtaining model parameters, training the model based on the training data, and returning the trained model parameters; and S4, emptying the data of the buffer pool, and repeating the reinforcement learning process of sampling and training. The training effect of the reinforcement learning algorithm is effectively improved, and the training time is shortened.
本申请属于强化学习技术领域,具体涉及一种基于Ray的分布式强化学习方法及装置。该方法包括步骤S1、接收设置在各采样节点的远端采样函数发送来的训练数据,并存储于缓冲池;步骤S2、定期轮询缓冲池的训练数据,待训练数据之和满足数</description><language>chi ; eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; COUNTING ; HANDLING RECORD CARRIERS ; PHYSICS ; PRESENTATION OF DATA ; RECOGNITION OF DATA ; RECORD CARRIERS</subject><creationdate>2022</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20220111&DB=EPODOC&CC=CN&NR=113920388A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,776,881,25543,76293</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20220111&DB=EPODOC&CC=CN&NR=113920388A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>FAN SONGYUAN</creatorcontrib><creatorcontrib>YU JIN</creatorcontrib><creatorcontrib>ZHAN GUANG</creatorcontrib><creatorcontrib>SUN ZHIXIAO</creatorcontrib><creatorcontrib>PIAO HAIYIN</creatorcontrib><creatorcontrib>HAN YUE</creatorcontrib><creatorcontrib>LANG KUIJUN</creatorcontrib><creatorcontrib>SUN YANG</creatorcontrib><creatorcontrib>PENG XUANQI</creatorcontrib><creatorcontrib>YANG SHENGQI</creatorcontrib><title>Ray-based distributed reinforcement learning method and device</title><description>The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling node, and storing the training data in a buffer pool; S2, periodically polling the training data of the buffer pool, and after the sum of the training data meets the quantity requirement, notifying and waiting for all sampling nodes to end sampling; S3, obtaining model parameters, training the model based on the training data, and returning the trained model parameters; and S4, emptying the data of the buffer pool, and repeating the reinforcement learning process of sampling and training. The training effect of the reinforcement learning algorithm is effectively improved, and the training time is shortened.
本申请属于强化学习技术领域,具体涉及一种基于Ray的分布式强化学习方法及装置。该方法包括步骤S1、接收设置在各采样节点的远端采样函数发送来的训练数据,并存储于缓冲池;步骤S2、定期轮询缓冲池的训练数据,待训练数据之和满足数</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>COUNTING</subject><subject>HANDLING RECORD CARRIERS</subject><subject>PHYSICS</subject><subject>PRESENTATION OF DATA</subject><subject>RECOGNITION OF DATA</subject><subject>RECORD CARRIERS</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2022</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNrjZLALSqzUTUosTk1RSMksLinKTCotAbKLUjPz0vKLklNzU_NKFHJSE4vyMvPSFXJTSzLyUxQS84CqU8syk1N5GFjTEnOKU3mhNDeDoptriLOHbmpBfnxqcUFicmpeakm8s5-hobGlkYGxhYWjMTFqAHfbMMs</recordid><startdate>20220111</startdate><enddate>20220111</enddate><creator>FAN SONGYUAN</creator><creator>YU JIN</creator><creator>ZHAN GUANG</creator><creator>SUN ZHIXIAO</creator><creator>PIAO HAIYIN</creator><creator>HAN YUE</creator><creator>LANG KUIJUN</creator><creator>SUN YANG</creator><creator>PENG XUANQI</creator><creator>YANG SHENGQI</creator><scope>EVB</scope></search><sort><creationdate>20220111</creationdate><title>Ray-based distributed reinforcement learning method and device</title><author>FAN SONGYUAN ; YU JIN ; ZHAN GUANG ; SUN ZHIXIAO ; PIAO HAIYIN ; HAN YUE ; LANG KUIJUN ; SUN YANG ; PENG XUANQI ; YANG SHENGQI</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN113920388A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2022</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>COUNTING</topic><topic>HANDLING RECORD CARRIERS</topic><topic>PHYSICS</topic><topic>PRESENTATION OF DATA</topic><topic>RECOGNITION OF DATA</topic><topic>RECORD CARRIERS</topic><toplevel>online_resources</toplevel><creatorcontrib>FAN SONGYUAN</creatorcontrib><creatorcontrib>YU JIN</creatorcontrib><creatorcontrib>ZHAN GUANG</creatorcontrib><creatorcontrib>SUN ZHIXIAO</creatorcontrib><creatorcontrib>PIAO HAIYIN</creatorcontrib><creatorcontrib>HAN YUE</creatorcontrib><creatorcontrib>LANG KUIJUN</creatorcontrib><creatorcontrib>SUN YANG</creatorcontrib><creatorcontrib>PENG XUANQI</creatorcontrib><creatorcontrib>YANG SHENGQI</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>FAN SONGYUAN</au><au>YU JIN</au><au>ZHAN GUANG</au><au>SUN ZHIXIAO</au><au>PIAO HAIYIN</au><au>HAN YUE</au><au>LANG KUIJUN</au><au>SUN YANG</au><au>PENG XUANQI</au><au>YANG SHENGQI</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Ray-based distributed reinforcement learning method and device</title><date>2022-01-11</date><risdate>2022</risdate><abstract>The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling node, and storing the training data in a buffer pool; S2, periodically polling the training data of the buffer pool, and after the sum of the training data meets the quantity requirement, notifying and waiting for all sampling nodes to end sampling; S3, obtaining model parameters, training the model based on the training data, and returning the trained model parameters; and S4, emptying the data of the buffer pool, and repeating the reinforcement learning process of sampling and training. The training effect of the reinforcement learning algorithm is effectively improved, and the training time is shortened.
本申请属于强化学习技术领域,具体涉及一种基于Ray的分布式强化学习方法及装置。该方法包括步骤S1、接收设置在各采样节点的远端采样函数发送来的训练数据,并存储于缓冲池;步骤S2、定期轮询缓冲池的训练数据,待训练数据之和满足数</abstract><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | |
ispartof | |
issn | |
language | chi ; eng |
recordid | cdi_epo_espacenet_CN113920388A |
source | esp@cenet |
subjects | CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING HANDLING RECORD CARRIERS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS |
title | Ray-based distributed reinforcement learning method and device |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-26T03%3A25%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=FAN%20SONGYUAN&rft.date=2022-01-11&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN113920388A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |