A Cooperative Hierarchical Deep Reinforcement Learning based Multi-agent Method for Distributed Job Shop Scheduling Problem with Random Job Arrivals
•A mathematical model for the dynamic problem is proposed.•Two agents are designed to solve the two critical subproblems of the problem.•A new DQN training method with variable threshold probability is designed.•The effectiveness of independent agents and multi-agent method is verified. Distributed...
Gespeichert in:
Veröffentlicht in: | Computers & industrial engineering 2023-11, Vol.185, p.109650, Article 109650 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | •A mathematical model for the dynamic problem is proposed.•Two agents are designed to solve the two critical subproblems of the problem.•A new DQN training method with variable threshold probability is designed.•The effectiveness of independent agents and multi-agent method is verified.
Distributed manufacturing can reduce the production cost through the cooperation among factories, and it has been an important trend in the industrial field. For the enterprises with daily delivered production tasks, the random job arrivals are regular. Thus, the Distributed Job-shop Scheduling Problem (DJSP) with random job arrivals is studied, and it is a typical case from the equipment manufacturing industry. The DJSP involves two coupled decision-making processes, job assigning and job sequencing, and the distributed and uncertain production environment requires the scheduling method to be more responsive and adaptive. Thus, a Deep Reinforcement Learning (DRL) based multi-agent method is explored, and it is composed of the assigning agent and the sequencing agent. Two Markov Decision Processes (MDPs) are formulated for the two agents respectively. In the MDP for the assigning agent, fourteen factory-and-job related features are extracted as the state features, seven composite assigning rules are designed as the candidate actions, and the reward depends on the total processing time of different factories. In the MDP of the sequencing agent, five machine-and-job related features are set as the state features, six sequencing rules make up the action space, and the change of the factory makespan is the reward. Besides, to enhance the learning ability of the agents, a Deep Q-Network (DQN) framework with variable threshold probability in the training stage is designed, which can balance the exploitation and exploration in the model training. The proposed multi-agent method’s effectiveness is proved by the independent utility test and the comparison test that are based on 1350 production instances, and its practical value in the actual production is implied by the case study from an automotive engine manufacturing company. |
---|---|
ISSN: | 0360-8352 |
DOI: | 10.1016/j.cie.2023.109650 |