Optimal PID and Antiwindup Control Design as a Reinforcement Learning Problem

Deep reinforcement learning (DRL) has seen several successful applications to process control. Common methods rely on a deep neural network structure to model the controller or process. With increasingly complicated control structures, the closed-loop stability of such methods becomes less clear. In...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2020-05
Hauptverfasser:	Lawrence, Nathan P, Stewart, Gregory E, Loewen, Philip D, bes, Michael G, Backstrom, Johan U, Gopaluni, R Bhushan
Format:	Artikel
Sprache:	eng
Schlagworte:	Actuators Anti-windup Artificial neural networks Computer Science - Learning Computer Science - Systems and Control Control methods Control stability Controllers Machine learning Mathematics - Optimization and Control Neural networks Nonlinear control Process controls Proportional integral derivative Structural stability
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Deep reinforcement learning (DRL) has seen several successful applications to process control. Common methods rely on a deep neural network structure to model the controller or process. With increasingly complicated control structures, the closed-loop stability of such methods becomes less clear. In this work, we focus on the interpretability of DRL control methods. In particular, we view linear fixed-structure controllers as shallow neural networks embedded in the actor-critic framework. PID controllers guide our development due to their simplicity and acceptance in industrial practice. We then consider input saturation, leading to a simple nonlinear control structure. In order to effectively operate within the actuator limits we then incorporate a tuning parameter for anti-windup compensation. Finally, the simplicity of the controller allows for straightforward initialization. This makes our method inherently stabilizing, both during and after training, and amenable to known operational PID gains.
ISSN:	2331-8422
DOI:	10.48550/arxiv.2005.04539