LEARNING METHOD, LEARNING DEVICE, CONTROL METHOD, CONTROL DEVICE, AND STORAGE MEDIUM

According to one embodiment, a learning method includes calculating a probability distribution indicating a distribution of a probability density or a distribution of a probability at which actions are selected, selecting a first action based on the probability distribution, causing a control target...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	KANEKO, Toshimitsu, NONAKA, Ryosuke
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	According to one embodiment, a learning method includes calculating a probability distribution indicating a distribution of a probability density or a distribution of a probability at which actions are selected, selecting a first action based on the probability distribution, causing a control target to execute the first action, receiving a reward and next observation data, calculating a probability density or a probability of the first action, correcting the reward, and updating the control parameter. The reward is corrected such that the reward increases as the probability density or probability decreases.