An Actor-Critic Framework for Online Control With Environment Stability Guarantee
Online actor-critic reinforcement learning is concerned with training an agent on-the-fly via dynamic interaction with the environment. Due to the specifics of the application, it is not generally possible to perform long pre-training, as it is commonly done in off-line, tabular or Monte-Carlo mode....
Gespeichert in:
Veröffentlicht in: | IEEE access 2023, Vol.11, p.89188-89204 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Online actor-critic reinforcement learning is concerned with training an agent on-the-fly via dynamic interaction with the environment. Due to the specifics of the application, it is not generally possible to perform long pre-training, as it is commonly done in off-line, tabular or Monte-Carlo mode. Such applications may be found more frequently in industry, rather than in pure digital fields, such as cloud services, video games, database management, etc., where reinforcement learning has been demonstrating success. Stability of the closed-loop of the agent plus the environment is a major challenge here, and not only in terms of the environment safety and integrity, but also in terms of sparing resources on failed training episodes. In this paper, we tackle the problem of environment stability under an actor-critic reinforcement learning agent by integration of the Lyapunov stability theory tools. Under the presented approach, the closed-loop stability is secured in all episodes without pre-training. It was observed in a case study with a mobile robot that the suggested agent could always successfully achieve the control goal, while significantly reducing the cost. While many approaches may be exploited for mobile robot control, we suggest that the experiments showed the promising potential of actor-critic reinforcement learning agents based on Lyapunov-like constraints. The presented methodology may be utilized in safety-critical, industrial applications where stability is necessary. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2023.3306070 |