Reward shaping method and device for sparse continuous control task, medium and terminal
The invention discloses a sparse continuous control task-oriented reward shaping method and device, a medium and a terminal, and the method comprises the steps: collecting empirical data generated by the interaction of an intelligent agent and a simulation environment, including the action executed...
Gespeichert in:
Hauptverfasser: | , , , , , , , , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a sparse continuous control task-oriented reward shaping method and device, a medium and a terminal, and the method comprises the steps: collecting empirical data generated by the interaction of an intelligent agent and a simulation environment, including the action executed by the intelligent agent, the current state information and the next state information of the simulation environment, and a simulation external reward signal; constructing a potential energy function network model by using a full-connection neural network, and obtaining potential energy values of the intelligent agent in the simulation environment in the current state and the next state; according to the potential energy value difference, calculating an internal reward signal through a reward shaping function, and combining the internal reward signal with the simulated external reward signal to obtain a final reward signal; updating the potential energy function network model by using a loss function, and adjusting |
---|