Adaptive job shop scheduling strategy based on weighted Q-learning algorithm

Given the dynamic and uncertain production environment of job shops, a scheduling strategy with adaptive features must be developed to fit variational production factors. Therefore, a dynamic scheduling system model based on multi-agent technology, including machine, buffer, state, and job agents, w...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of intelligent manufacturing 2020-02, Vol.31 (2), p.417-432
1. Verfasser:	Wang, Yu-Fang
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Business and Management Clustering Computer Science Computer Science, Artificial Intelligence Computer simulation Control Engineering Engineering, Manufacturing Feature extraction Iterative methods Job shop scheduling Job shops Machine learning Machine shops Machines Manufacturing Mechatronics Multiagent systems Optimization Processes Production Production scheduling Robotics Scheduling Science & Technology Searching Strategy Technology
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Given the dynamic and uncertain production environment of job shops, a scheduling strategy with adaptive features must be developed to fit variational production factors. Therefore, a dynamic scheduling system model based on multi-agent technology, including machine, buffer, state, and job agents, was built. A weighted Q-learning algorithm based on clustering and dynamic search was used to determine the most suitable operation and to optimize production. To address the large state space problem caused by changes in the system state, four state features were extracted. The dimension of the system state was decreased through the clustering method. To reduce the error between the actual system states and clustering ones, the state difference degree was defined and integrated with the iteration formula of the Q function. To select the optimal state-action pair, improved search and iteration update strategies were proposed. Convergence analysis of the proposed algorithm and simulation experiments indicated that the proposed adaptive strategy is well adaptable and effective in different scheduling environments, and shows better performance in complex environments. The two contributions of this research are as follows: (1) a dynamic greedy search strategy was developed to avoid blind searching in traditional strategy. (2) Weighted iteration update of the Q function, including the weighted mean of the maximum fuzzy earning, was designed to improve the speed and accuracy of the improved learning algorithm.
ISSN:	0956-5515 1572-8145
DOI:	10.1007/s10845-018-1454-3