Intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO

The invention discloses an intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO, the system comprises a world model, a strategy model and an experience pool, simulation experience generated by the world model is stored in the experience po...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	FANG WENQI, LUAN SHAOTONG, GE PIN, TAIRA HIROSHI, SHEN YUANYUAN, DAI YINGFENG, JIN XINZHU, MIAO ZHENGYUAN, WU GUANLIN
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention discloses an intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO, the system comprises a world model, a strategy model and an experience pool, simulation experience generated by the world model is stored in the experience pool, the world model is a world model based on GP, the strategy model comprises a PPO algorithm, and the PPO algorithm comprises a PPO algorithm. And the PPO algorithm utilizes simulation experience in the experience pool to carry out reinforcement learning. According to the Dyna-PPO method based on the GP, a DQN algorithm in a Dyna-Q framework is replaced with an optimized PPO algorithm, the improved framework has the advantages of a model-free DRL scheme and a model-based DRL scheme and can be used for solving the decision-making problem of continuous actions, and therefore continuous action decision-making based on the Dyna-framework is achieved. 本发明公开了一种基于GP与PPO实现连续性动作决策的智能决策方法和系统，包括世界模型、策略模型和经验池，由世界模型生成的模拟经验被存入