A Lyapunov approach for stable reinforcement learning

Our strategy is based on a novel reinforcement-learning (RL) Lyapunov methodology. We propose a method for constructing Lyapunov-like functions using a feed-forward Markov decision process. These functions are important for assuring the stability of a behavior policy throughout the learning process....

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Computational & applied mathematics 2022-09, Vol.41 (6), Article 279
1. Verfasser: Clempner, Julio B.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Our strategy is based on a novel reinforcement-learning (RL) Lyapunov methodology. We propose a method for constructing Lyapunov-like functions using a feed-forward Markov decision process. These functions are important for assuring the stability of a behavior policy throughout the learning process. We show that the cost sequence, which corresponds to the best approach, is frequently non-monotonic, implying that convergence cannot be guaranteed. For any Markov-ergodic process, our technique generates a Lyapunov-like function, implying an one-to-one correspondence between the present cost-function and the suggested function, resulting in a monotonically non-increase behavior on the trajectories under optimum strategy realization. We show that the system’s dynamics and trajectory converge. We show how to employ the Lyapunov technique to solve RL problems. We explain how to employ the Lyapunov method to RL. We test the proposed approach to demonstrate its efficacy.
ISSN:2238-3603
1807-0302
DOI:10.1007/s40314-022-01988-y