A novel investigation on the effects of state and reward structure in designing deep reinforcement learning-based controller for nonlinear dynamical systems

In the last decade, the popularity of deep reinforcement learning (DRL)-based controller design for complex and uncertain nonlinear dynamic systems has grown exponentially due to its model-free approach. Most of these studies focus on algorithmic developments to improve the learning process. However...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of dynamics and control 2024-08, Vol.12 (8), p.3017-3032
Hauptverfasser:	Mukhopadhyay, Rajarshi, Sutradhar, Ashoke, Chattopadhyay, Paramita
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Closed loops Complexity Control Control and Systems Theory Control systems design Controllers Deep learning Design factors Dynamical Systems Engineering Machine learning Nonlinear control Nonlinear dynamics Nonlinear systems Performance measurement Vibration
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In the last decade, the popularity of deep reinforcement learning (DRL)-based controller design for complex and uncertain nonlinear dynamic systems has grown exponentially due to its model-free approach. Most of these studies focus on algorithmic developments to improve the learning process. However, the performance of the final learned control policy in a closed-loop deployment needs to be explored. They are greatly dependent on carefully selecting the input feature set or state and the design of the reward structure. The present investigation is a novel simulation study to demonstrate the impact of these two critical factors on a benchmark nonlinear control problem: the swing-up and stabilisation of an inverted pendulum on the cart with restricted cart movement. This study compares the raw sensor signal-based, physics-inspired, hand-crafted, system-knowledge-driven, meaningful features along with deep autoencoder-based self-synthesised abstract features when using the deep deterministic policy gradient-type DRL algorithm for controller design in continuous action space. In addition, the impact of different reward structures designed with sparse, situation-based, intuitive incentive-penalty enhancements on the standard quadratic reward function formulation is also studied. Finally, the superiority of specific blends of feature set/state and reward structures over the rest is established in a closed-loop study by comparing and analysing standard performance metrics and energy-efficient control action.
ISSN:	2195-268X 2195-2698
DOI:	10.1007/s40435-024-01407-6