LEARNING METHOD, LEARNING DEVICE, CONTROL METHOD, CONTROL DEVICE, AND STORAGE MEDIUM

According to one embodiment, a learning method includes calculating a probability distribution indicating a distribution of a probability density or a distribution of a probability at which actions are selected, selecting a first action based on the probability distribution, causing a control target...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: KANEKO, Toshimitsu, NONAKA, Ryosuke
Format: Patent
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:According to one embodiment, a learning method includes calculating a probability distribution indicating a distribution of a probability density or a distribution of a probability at which actions are selected, selecting a first action based on the probability distribution, causing a control target to execute the first action, receiving a reward and next observation data, calculating a probability density or a probability of the first action, correcting the reward, and updating the control parameter. The reward is corrected such that the reward increases as the probability density or probability decreases.