Koopman-Assisted Reinforcement Learning
The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory. However, these equations quickly become intractable for systems with high-dimensional states and nonlinearity. This paper explores the connectio...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-03 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory. However, these equations quickly become intractable for systems with high-dimensional states and nonlinearity. This paper explores the connection between the data-driven Koopman operator and Markov Decision Processes (MDPs), resulting in the development of two new RL algorithms to address these limitations. We leverage Koopman operator techniques to lift a nonlinear system into new coordinates where the dynamics become approximately linear, and where HJB-based methods are more tractable. In particular, the Koopman operator is able to capture the expectation of the time evolution of the value function of a given system via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a ``Koopman tensor'' that facilitates the estimation of the optimal value function. Then, a transformation of Bellman's framework in terms of the Koopman tensor enables us to reformulate two max-entropy RL algorithms: soft value iteration and soft actor-critic (SAC). This highly flexible framework can be used for deterministic or stochastic systems as well as for discrete or continuous-time dynamics. Finally, we show that these Koopman Assisted Reinforcement Learning (KARL) algorithms attain state-of-the-art (SOTA) performance with respect to traditional neural network-based SAC and linear quadratic regulator (LQR) baselines on four controlled dynamical systems: a linear state-space system, the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing. |
---|---|
ISSN: | 2331-8422 |