Reinforcement learning based optimal control of batch processes using Monte-Carlo deep deterministic policy gradient with phase segmentation

•Design of reward function is suggested for the general economic process control.•Phase segmentation approach is proposed to address distinct characteristics of various phases of a batch run.•DDPG algorithm is modified with Monte-Carlo learning for stable agent training.•Suggested algorithm is appli...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers & chemical engineering 2021-01, Vol.144, p.107133, Article 107133
Hauptverfasser:	Yoo, Haeun, Kim, Boeun, Kim, Jong Woo, Lee, Jay H.
Format:	Artikel
Sprache:	eng
Schlagworte:	Actor-Critic Batch process Optimal control Reinforcement learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	•Design of reward function is suggested for the general economic process control.•Phase segmentation approach is proposed to address distinct characteristics of various phases of a batch run.•DDPG algorithm is modified with Monte-Carlo learning for stable agent training.•Suggested algorithm is applied to a batch polymerization process control problem. Batch process control represents a challenge given its dynamic operation over a large operating envelope. Nonlinear model predictive control (NMPC) is the current standard for optimal control of batch processes. The performance of conventional NMPC can be unsatisfactory in the presence of uncertainties. Reinforcement learning (RL) which can utilize simulation or real operation data is a viable alternative for such problems. To apply RL to batch process control effectively, however, choices such as the reward function design and value update method must be made carefully. This study proposes a phase segmentation approach for the reward function design and value/policy function representation. In addition, the deep deterministic policy gradient algorithm (DDPG) is modified with Monte-Carlo learning to ensure more stable and efficient learning behavior. A case study of a batch polymerization process producing polyols is used to demonstrate the improvement brought by the proposed approach and to highlight further issues.
ISSN:	0098-1354 1873-4375
DOI:	10.1016/j.compchemeng.2020.107133