Reward shaping method and device for sparse continuous control task, medium and terminal

The invention discloses a sparse continuous control task-oriented reward shaping method and device, a medium and a terminal, and the method comprises the steps: collecting empirical data generated by the interaction of an intelligent agent and a simulation environment, including the action executed...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	WANG QIANYU, SHANG ZHAOWEI, XIANG TAO, FENG YONG, PU HUAYAN, LUO JUN, YAN JIELU, ZHOU MINGLIANG, WEI XUEKAI, LIN JIAWEI, FANG BIN
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING ELECTRIC DIGITAL DATA PROCESSING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention discloses a sparse continuous control task-oriented reward shaping method and device, a medium and a terminal, and the method comprises the steps: collecting empirical data generated by the interaction of an intelligent agent and a simulation environment, including the action executed by the intelligent agent, the current state information and the next state information of the simulation environment, and a simulation external reward signal; constructing a potential energy function network model by using a full-connection neural network, and obtaining potential energy values of the intelligent agent in the simulation environment in the current state and the next state; according to the potential energy value difference, calculating an internal reward signal through a reward shaping function, and combining the internal reward signal with the simulated external reward signal to obtain a final reward signal; updating the potential energy function network model by using a loss function, and adjusting