Ray-based distributed reinforcement learning method and device

The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling no...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	FAN SONGYUAN, YU JIN, ZHAN GUANG, SUN ZHIXIAO, PIAO HAIYIN, HAN YUE, LANG KUIJUN, SUN YANG, PENG XUANQI, YANG SHENGQI
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING HANDLING RECORD CARRIERS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention belongs to the technical field of reinforcement learning, and particularly relates to a Ray-based distributed reinforcement learning method and device. The method comprises the following steps: S1, receiving training data sent by a far-end sampling function arranged at each sampling node, and storing the training data in a buffer pool; S2, periodically polling the training data of the buffer pool, and after the sum of the training data meets the quantity requirement, notifying and waiting for all sampling nodes to end sampling; S3, obtaining model parameters, training the model based on the training data, and returning the trained model parameters; and S4, emptying the data of the buffer pool, and repeating the reinforcement learning process of sampling and training. The training effect of the reinforcement learning algorithm is effectively improved, and the training time is shortened. 本申请属于强化学习技术领域，具体涉及一种基于Ray的分布式强化学习方法及装置。该方法包括步骤S1、接收设置在各采样节点的远端采样函数发送来的训练数据，并存储于缓冲池；步骤S2、定期轮询缓冲池的训练数据，待训练数据之和满足数