Koopman-Assisted Reinforcement Learning

The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory. However, these equations quickly become intractable for systems with high-dimensional states and nonlinearity. This paper explores the connectio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-03
Hauptverfasser:	Preston Rozwood, Mehrez, Edward, Paehler, Ludger, Sun, Wen, Brunton, Steven L
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Control theory Dynamical systems Fluid flow Linear quadratic regulator Lorenz system Machine learning Markov processes Mathematical analysis Neural networks Nonlinear systems Nonlinearity Operators (mathematics) Stochastic systems Tensors
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Preston Rozwood Mehrez, Edward Paehler, Ludger Sun, Wen Brunton, Steven L
description	The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory. However, these equations quickly become intractable for systems with high-dimensional states and nonlinearity. This paper explores the connection between the data-driven Koopman operator and Markov Decision Processes (MDPs), resulting in the development of two new RL algorithms to address these limitations. We leverage Koopman operator techniques to lift a nonlinear system into new coordinates where the dynamics become approximately linear, and where HJB-based methods are more tractable. In particular, the Koopman operator is able to capture the expectation of the time evolution of the value function of a given system via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a ``Koopman tensor'' that facilitates the estimation of the optimal value function. Then, a transformation of Bellman's framework in terms of the Koopman tensor enables us to reformulate two max-entropy RL algorithms: soft value iteration and soft actor-critic (SAC). This highly flexible framework can be used for deterministic or stochastic systems as well as for discrete or continuous-time dynamics. Finally, we show that these Koopman Assisted Reinforcement Learning (KARL) algorithms attain state-of-the-art (SOTA) performance with respect to traditional neural network-based SAC and linear quadratic regulator (LQR) baselines on four controlled dynamical systems: a linear state-space system, the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2937451028</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2937451028</sourcerecordid><originalsourceid>FETCH-proquest_journals_29374510283</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRQ987PL8hNzNN1LC7OLC5JTVEISs3MS8svSk7NTc0rUfBJTSzKy8xL52FgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCNLY3MTU0MDIwtj4lQBAIzeMAE</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2937451028</pqid></control><display><type>article</type><title>Koopman-Assisted Reinforcement Learning</title><source>Free E- Journals</source><creator>Preston Rozwood ; Mehrez, Edward ; Paehler, Ludger ; Sun, Wen ; Brunton, Steven L</creator><creatorcontrib>Preston Rozwood ; Mehrez, Edward ; Paehler, Ludger ; Sun, Wen ; Brunton, Steven L</creatorcontrib><description>The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory. However, these equations quickly become intractable for systems with high-dimensional states and nonlinearity. This paper explores the connection between the data-driven Koopman operator and Markov Decision Processes (MDPs), resulting in the development of two new RL algorithms to address these limitations. We leverage Koopman operator techniques to lift a nonlinear system into new coordinates where the dynamics become approximately linear, and where HJB-based methods are more tractable. In particular, the Koopman operator is able to capture the expectation of the time evolution of the value function of a given system via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a ``Koopman tensor'' that facilitates the estimation of the optimal value function. Then, a transformation of Bellman's framework in terms of the Koopman tensor enables us to reformulate two max-entropy RL algorithms: soft value iteration and soft actor-critic (SAC). This highly flexible framework can be used for deterministic or stochastic systems as well as for discrete or continuous-time dynamics. Finally, we show that these Koopman Assisted Reinforcement Learning (KARL) algorithms attain state-of-the-art (SOTA) performance with respect to traditional neural network-based SAC and linear quadratic regulator (LQR) baselines on four controlled dynamical systems: a linear state-space system, the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Control theory ; Dynamical systems ; Fluid flow ; Linear quadratic regulator ; Lorenz system ; Machine learning ; Markov processes ; Mathematical analysis ; Neural networks ; Nonlinear systems ; Nonlinearity ; Operators (mathematics) ; Stochastic systems ; Tensors</subject><ispartof>arXiv.org, 2024-03</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>776,780</link.rule.ids></links><search><creatorcontrib>Preston Rozwood</creatorcontrib><creatorcontrib>Mehrez, Edward</creatorcontrib><creatorcontrib>Paehler, Ludger</creatorcontrib><creatorcontrib>Sun, Wen</creatorcontrib><creatorcontrib>Brunton, Steven L</creatorcontrib><title>Koopman-Assisted Reinforcement Learning</title><title>arXiv.org</title><description>The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory. However, these equations quickly become intractable for systems with high-dimensional states and nonlinearity. This paper explores the connection between the data-driven Koopman operator and Markov Decision Processes (MDPs), resulting in the development of two new RL algorithms to address these limitations. We leverage Koopman operator techniques to lift a nonlinear system into new coordinates where the dynamics become approximately linear, and where HJB-based methods are more tractable. In particular, the Koopman operator is able to capture the expectation of the time evolution of the value function of a given system via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a ``Koopman tensor'' that facilitates the estimation of the optimal value function. Then, a transformation of Bellman's framework in terms of the Koopman tensor enables us to reformulate two max-entropy RL algorithms: soft value iteration and soft actor-critic (SAC). This highly flexible framework can be used for deterministic or stochastic systems as well as for discrete or continuous-time dynamics. Finally, we show that these Koopman Assisted Reinforcement Learning (KARL) algorithms attain state-of-the-art (SOTA) performance with respect to traditional neural network-based SAC and linear quadratic regulator (LQR) baselines on four controlled dynamical systems: a linear state-space system, the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing.</description><subject>Algorithms</subject><subject>Control theory</subject><subject>Dynamical systems</subject><subject>Fluid flow</subject><subject>Linear quadratic regulator</subject><subject>Lorenz system</subject><subject>Machine learning</subject><subject>Markov processes</subject><subject>Mathematical analysis</subject><subject>Neural networks</subject><subject>Nonlinear systems</subject><subject>Nonlinearity</subject><subject>Operators (mathematics)</subject><subject>Stochastic systems</subject><subject>Tensors</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mRQ987PL8hNzNN1LC7OLC5JTVEISs3MS8svSk7NTc0rUfBJTSzKy8xL52FgTUvMKU7lhdLcDMpuriHOHroFRfmFpanFJfFZ-aVFeUCpeCNLY3MTU0MDIwtj4lQBAIzeMAE</recordid><startdate>20240304</startdate><enddate>20240304</enddate><creator>Preston Rozwood</creator><creator>Mehrez, Edward</creator><creator>Paehler, Ludger</creator><creator>Sun, Wen</creator><creator>Brunton, Steven L</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20240304</creationdate><title>Koopman-Assisted Reinforcement Learning</title><author>Preston Rozwood ; Mehrez, Edward ; Paehler, Ludger ; Sun, Wen ; Brunton, Steven L</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_29374510283</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Control theory</topic><topic>Dynamical systems</topic><topic>Fluid flow</topic><topic>Linear quadratic regulator</topic><topic>Lorenz system</topic><topic>Machine learning</topic><topic>Markov processes</topic><topic>Mathematical analysis</topic><topic>Neural networks</topic><topic>Nonlinear systems</topic><topic>Nonlinearity</topic><topic>Operators (mathematics)</topic><topic>Stochastic systems</topic><topic>Tensors</topic><toplevel>online_resources</toplevel><creatorcontrib>Preston Rozwood</creatorcontrib><creatorcontrib>Mehrez, Edward</creatorcontrib><creatorcontrib>Paehler, Ludger</creatorcontrib><creatorcontrib>Sun, Wen</creatorcontrib><creatorcontrib>Brunton, Steven L</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Preston Rozwood</au><au>Mehrez, Edward</au><au>Paehler, Ludger</au><au>Sun, Wen</au><au>Brunton, Steven L</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Koopman-Assisted Reinforcement Learning</atitle><jtitle>arXiv.org</jtitle><date>2024-03-04</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>The Bellman equation and its continuous form, the Hamilton-Jacobi-Bellman (HJB) equation, are ubiquitous in reinforcement learning (RL) and control theory. However, these equations quickly become intractable for systems with high-dimensional states and nonlinearity. This paper explores the connection between the data-driven Koopman operator and Markov Decision Processes (MDPs), resulting in the development of two new RL algorithms to address these limitations. We leverage Koopman operator techniques to lift a nonlinear system into new coordinates where the dynamics become approximately linear, and where HJB-based methods are more tractable. In particular, the Koopman operator is able to capture the expectation of the time evolution of the value function of a given system via linear dynamics in the lifted coordinates. By parameterizing the Koopman operator with the control actions, we construct a ``Koopman tensor'' that facilitates the estimation of the optimal value function. Then, a transformation of Bellman's framework in terms of the Koopman tensor enables us to reformulate two max-entropy RL algorithms: soft value iteration and soft actor-critic (SAC). This highly flexible framework can be used for deterministic or stochastic systems as well as for discrete or continuous-time dynamics. Finally, we show that these Koopman Assisted Reinforcement Learning (KARL) algorithms attain state-of-the-art (SOTA) performance with respect to traditional neural network-based SAC and linear quadratic regulator (LQR) baselines on four controlled dynamical systems: a linear state-space system, the Lorenz system, fluid flow past a cylinder, and a double-well potential with non-isotropic stochastic forcing.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-03
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2937451028
source	Free E- Journals
subjects	Algorithms Control theory Dynamical systems Fluid flow Linear quadratic regulator Lorenz system Machine learning Markov processes Mathematical analysis Neural networks Nonlinear systems Nonlinearity Operators (mathematics) Stochastic systems Tensors
title	Koopman-Assisted Reinforcement Learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-25T07%3A07%3A08IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Koopman-Assisted%20Reinforcement%20Learning&rft.jtitle=arXiv.org&rft.au=Preston%20Rozwood&rft.date=2024-03-04&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2937451028%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2937451028&rft_id=info:pmid/&rfr_iscdi=true