Distributed policy search reinforcement learning for job-shop scheduling tasks

We interpret job-shop scheduling problems as sequential decision problems that are handled by independent learning agents. These agents act completely decoupled from one another and employ probabilistic dispatching policies for which we propose a compact representation using a small set of real-valu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:International journal of production research 2012-01, Vol.50 (1), p.41-61
Hauptverfasser: Gabel, Thomas, Riedmiller, Martin
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We interpret job-shop scheduling problems as sequential decision problems that are handled by independent learning agents. These agents act completely decoupled from one another and employ probabilistic dispatching policies for which we propose a compact representation using a small set of real-valued parameters. During ongoing learning, the agents adapt these parameters using policy gradient reinforcement learning, with the aim of improving the performance of the joint policy measured in terms of a standard scheduling objective function. Moreover, we suggest a lightweight communication mechanism that enhances the agents' capabilities beyond purely reactive job dispatching. We evaluate the effectiveness of our learning approach using various deterministic as well as stochastic job-shop scheduling benchmark problems, demonstrating that the utilisation of policy gradient methods can be effective and beneficial for scheduling problems.
ISSN:0020-7543
1366-588X
DOI:10.1080/00207543.2011.571443