Safe, visualizable reinforcement learning for process control with a warm-started actor network based on PI-control

The adoption of reinforcement learning (RL) in chemical process industries is currently hindered by the use of black-box models that cannot be easily visualized or interpreted as well as the challenge of balancing safe control with exploration. Clearly illustrating the similarities between classical...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of process control 2024-12, Vol.144, p.103340, Article 103340
Hauptverfasser: Bras, Edward H., Louw, Tobias M., Bradshaw, Steven M.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The adoption of reinforcement learning (RL) in chemical process industries is currently hindered by the use of black-box models that cannot be easily visualized or interpreted as well as the challenge of balancing safe control with exploration. Clearly illustrating the similarities between classical control- and RL theory, as well as demonstrating the possibility of maintaining process safety under RL-based control, will go a long way towards bridging the gap between academic research and industry practice. In this work, a simple approach to the dynamic online adaptation of a non-linear control policy initialised using PI control through RL is introduced. The familiar PI controller is represented as a plane in the state-action space, where the states comprise the error and integral error, and the action is the control input. The plane was recreated using a neural network and this recreated plane served as a readily visualizable initial “warm-started” policy for the RL agent. The actor-critic algorithm was applied to adapt the policy non-linearly during interaction with the controlled process, thereby leveraging the flexibility of the neural network to improve performance. Inherently safe control during training is ensured by introducing a soft active region component in the actor neural network. Finally, the use of cold connections is proposed whereby the state space can be augmented at any stage of training (e.g., through the incorporation of measurements to facilitate feedforward control) while fully preserving the agent’s training progress to date. By ensuring controller safety, the proposed methods are applicable to the dynamic adaptation of any process where stable PI control is feasible at nominal initial conditions. •Actor-critic reinforcement learning agent initialised using PI-control policy.•Neural network architecture enables non-linear control policy and safe exploration.•Non-updating transitions enable exploration through artificial state adjustments.•Cold connections efficiently incorporate new states for feedforward control.•RL framed as an inherent connection between classical-and optimal control.
ISSN:0959-1524
DOI:10.1016/j.jprocont.2024.103340