Petri-net-based dynamic scheduling of flexible manufacturing system via deep reinforcement learning with graph convolutional network

•Real-time, adaptive and easily deployable dynamic scheduling method for FMS.•Combination of timed-place Petri nets and S3PR to model FMS.•Introduction of novel GCN layer to handle hidden information of Petri nets.•Use of deep Q-network (DQN) with GCN to solve dynamic scheduling problem of FMS. To b...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of manufacturing systems 2020-04, Vol.55, p.1-14
Hauptverfasser: Hu, Liang, Liu, Zhenyu, Hu, Weifei, Wang, Yueyang, Tan, Jianrong, Wu, Fei
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•Real-time, adaptive and easily deployable dynamic scheduling method for FMS.•Combination of timed-place Petri nets and S3PR to model FMS.•Introduction of novel GCN layer to handle hidden information of Petri nets.•Use of deep Q-network (DQN) with GCN to solve dynamic scheduling problem of FMS. To benefit from the accurate simulation and high-throughput data contributed by advanced digital twin technologies in modern smart plants, the deep reinforcement learning (DRL) method is an appropriate choice to generate a self-optimizing scheduling policy. This study employs the deep Q-network (DQN), which is a successful DRL method, to solve the dynamic scheduling problem of flexible manufacturing systems (FMSs) involving shared resources, route flexibility, and stochastic arrivals of raw products. To model the system in consideration of both manufacturing efficiency and deadlock avoidance, we use a class of Petri nets combining timed-place Petri nets and a system of simple sequential processes with resources (S3PR), which is named as the timed S3PR. The dynamic scheduling problem of the timed S3PR is defined as a Markov decision process (MDP) that can be solved by the DQN. For constructing deep neural networks to approximate the DQN action-value function that maps the timed S3PR states to scheduling rewards, we innovatively employ a graph convolutional network (GCN) as the timed S3PR state approximator by proposing a novel graph convolution layer called a Petri-net convolution (PNC) layer. The PNC layer uses the input and output matrices of the timed S3PR to compute the propagation of features from places to transitions and from transitions to places, thereby reducing the number of parameters to be trained and ensuring robust convergence of the learning process. Experimental results verify that the proposed DQN with a PNC network can provide better solutions for dynamic scheduling problems in terms of manufacturing performance, computational efficiency, and adaptability compared with heuristic methods and a DQN with basic multilayer perceptrons.
ISSN:0278-6125
1878-6642
DOI:10.1016/j.jmsy.2020.02.004