LEARNING METHOD, LEARNING DEVICE, CONTROL METHOD, CONTROL DEVICE, AND STORAGE MEDIUM
According to one embodiment, a learning method includes calculating a probability distribution indicating a distribution of a probability density or a distribution of a probability at which actions are selected, selecting a first action based on the probability distribution, causing a control target...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | According to one embodiment, a learning method includes calculating a probability distribution indicating a distribution of a probability density or a distribution of a probability at which actions are selected, selecting a first action based on the probability distribution, causing a control target to execute the first action, receiving a reward and next observation data, calculating a probability density or a probability of the first action, correcting the reward, and updating the control parameter. The reward is corrected such that the reward increases as the probability density or probability decreases. |
---|