Reward shaping method and device for sparse continuous control task, medium and terminal

The invention discloses a sparse continuous control task-oriented reward shaping method and device, a medium and a terminal, and the method comprises the steps: collecting empirical data generated by the interaction of an intelligent agent and a simulation environment, including the action executed...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: WANG QIANYU, SHANG ZHAOWEI, XIANG TAO, FENG YONG, PU HUAYAN, LUO JUN, YAN JIELU, ZHOU MINGLIANG, WEI XUEKAI, LIN JIAWEI, FANG BIN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a sparse continuous control task-oriented reward shaping method and device, a medium and a terminal, and the method comprises the steps: collecting empirical data generated by the interaction of an intelligent agent and a simulation environment, including the action executed by the intelligent agent, the current state information and the next state information of the simulation environment, and a simulation external reward signal; constructing a potential energy function network model by using a full-connection neural network, and obtaining potential energy values of the intelligent agent in the simulation environment in the current state and the next state; according to the potential energy value difference, calculating an internal reward signal through a reward shaping function, and combining the internal reward signal with the simulated external reward signal to obtain a final reward signal; updating the potential energy function network model by using a loss function, and adjusting