Intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO
The invention discloses an intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO, the system comprises a world model, a strategy model and an experience pool, simulation experience generated by the world model is stored in the experience po...
Gespeichert in:
Hauptverfasser: | , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses an intelligent decision-making method and system for realizing continuous action decision-making based on GP and PPO, the system comprises a world model, a strategy model and an experience pool, simulation experience generated by the world model is stored in the experience pool, the world model is a world model based on GP, the strategy model comprises a PPO algorithm, and the PPO algorithm comprises a PPO algorithm. And the PPO algorithm utilizes simulation experience in the experience pool to carry out reinforcement learning. According to the Dyna-PPO method based on the GP, a DQN algorithm in a Dyna-Q framework is replaced with an optimized PPO algorithm, the improved framework has the advantages of a model-free DRL scheme and a model-based DRL scheme and can be used for solving the decision-making problem of continuous actions, and therefore continuous action decision-making based on the Dyna-framework is achieved.
本发明公开了一种基于GP与PPO实现连续性动作决策的智能决策方法和系统,包括世界模型、策略模型和经验池,由世界模型生成的模拟经验被存入 |
---|