Apparatus and method for training reinforcement learning model for use in combinatorial optimization

An apparatus for training a reinforcement learning model according to an embodiment includes a starting point determinator configured to determine starting points from an input value of a combinatorial optimization problem, a multi-explorer configured to generate exploration trajectories by performi...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Kwon, Yeong Dae, Yoon, Il Joo, Kim, Byoung Jip, Choo, Jin Ho
Format:	Patent
Sprache:	eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING PHYSICS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	An apparatus for training a reinforcement learning model according to an embodiment includes a starting point determinator configured to determine starting points from an input value of a combinatorial optimization problem, a multi-explorer configured to generate exploration trajectories by performing exploration from each of the starting points using a reinforcement learning model, a trajectory evaluator configured to calculate an evaluation value of each of the exploration trajectories using an evaluation function of the combinatorial optimization problem, a baseline calculator configured to calculate a baseline for the input value from the evaluation value of each exploration trajectory, an advantage calculator configured to calculate an advantage of each of the exploration trajectories using the evaluation value of each exploration trajectory and the baseline, and a parameter updater configured to update parameters of the reinforcement learning model by using the exploration trajectories and the advantages of each exploration trajectory.