Apparatus and method for training reinforcement learning model for use in combinatorial optimization
An apparatus for training a reinforcement learning model according to an embodiment includes a starting point determinator configured to determine starting points from an input value of a combinatorial optimization problem, a multi-explorer configured to generate exploration trajectories by performi...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | An apparatus for training a reinforcement learning model according to an embodiment includes a starting point determinator configured to determine starting points from an input value of a combinatorial optimization problem, a multi-explorer configured to generate exploration trajectories by performing exploration from each of the starting points using a reinforcement learning model, a trajectory evaluator configured to calculate an evaluation value of each of the exploration trajectories using an evaluation function of the combinatorial optimization problem, a baseline calculator configured to calculate a baseline for the input value from the evaluation value of each exploration trajectory, an advantage calculator configured to calculate an advantage of each of the exploration trajectories using the evaluation value of each exploration trajectory and the baseline, and a parameter updater configured to update parameters of the reinforcement learning model by using the exploration trajectories and the advantages of each exploration trajectory. |
---|