Intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO

The invention discloses an intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO, the system comprises a world model, a strategy model and an experience pool, simulation experience generated by the world model is stored in the experience po...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: FANG WENQI, LUAN SHAOTONG, GE PIN, TAIRA HIROSHI, SHEN YUANYUAN, DAI YINGFENG, JIN XINZHU, MIAO ZHENGYUAN, WU GUANLIN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses an intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO, the system comprises a world model, a strategy model and an experience pool, simulation experience generated by the world model is stored in the experience pool, the world model is a world model based on GP, the strategy model comprises a PPO algorithm, and the PPO algorithm comprises a PPO algorithm. And the PPO algorithm utilizes simulation experience in the experience pool to carry out reinforcement learning. According to the Dyna-PPO method based on the GP, a DQN algorithm in a Dyna-Q framework is replaced with an optimized PPO algorithm, the improved framework has the advantages of a model-free DRL scheme and a model-based DRL scheme and can be used for solving the decision-making problem of continuous actions, and therefore continuous action decision-making based on the Dyna-framework is achieved. 本发明公开了一种基于GP与PPO实现连续性动作决策的智能决策方法和系统,包括世界模型、策略模型和经验池,由世界模型生成的模拟经验被存入