Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback

In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, w...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on cybernetics 2020-11, Vol.50 (11), p.4670-4679
Hauptverfasser: Rizvi, Syed Ali Asad, Lin, Zongli
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4679
container_issue 11
container_start_page 4670
container_title IEEE transactions on cybernetics
container_volume 50
creator Rizvi, Syed Ali Asad
Lin, Zongli
description In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.
doi_str_mv 10.1109/TCYB.2018.2886735
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8600378</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8600378</ieee_id><sourcerecordid>2456527746</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3</originalsourceid><addsrcrecordid>eNpdkUFvGyEQhVHVqomS_ICqUoXUSy_rDOwu4GPjNm0lS1ET55ATwjBEpF7WgeXgf18suz6UC6PH955GPEI-MJgxBvPr1eLpZsaBqRlXSsi2f0POOROq4Vz2b0-zkGfkKucXqEdVaa7ek7MWBPSMyXPyeo8h-jFZHDBOdIkmxRCfmxuT0dFliFWgv4txyUzB0nt8Lps6jZGOni7GOIVYxpKbVRiQPuzyhEOmj7lG0G-7aIbquSvTtkz0FtGtjf1zSd55s8l4dbwvyOPt99XiZ7O8-_Fr8XXZ2LabT42VgjPVdajmjoHsheucUUaiBwWih9ZBi9xXVXprHfoe1t4id9BZp8C3F-TLIXebxteCedJDyBY3GxOxbqzr_7TAOOOyop__Q1_GkmLdTvOuFz2XshOVYgfKpjHnhF5vUxhM2mkGel-J3lei95XoYyXV8-mYXNYDupPjXwEV-HgAAiKenpUAaKVq_wJ-PZB5</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2456527746</pqid></control><display><type>article</type><title>Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback</title><source>IEEE Electronic Library (IEL)</source><creator>Rizvi, Syed Ali Asad ; Lin, Zongli</creator><creatorcontrib>Rizvi, Syed Ali Asad ; Lin, Zongli</creatorcontrib><description>In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.</description><identifier>ISSN: 2168-2267</identifier><identifier>EISSN: 2168-2275</identifier><identifier>DOI: 10.1109/TCYB.2018.2886735</identifier><identifier>PMID: 30605117</identifier><identifier>CODEN: ITCEB8</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Adaptive dynamic programming (ADP) ; Algorithms ; Continuous time systems ; Cost function ; Dynamic programming ; Feedback control ; Iterative methods ; Learning ; Linear quadratic regulator ; linear quadratic regulator (LQR) ; Mathematical model ; Mathematical models ; Optimal control ; Optimization ; Output feedback ; Parameter estimation ; Parameterization ; reinforcement learning (RL) ; Riccati equation ; Stability analysis ; State feedback</subject><ispartof>IEEE transactions on cybernetics, 2020-11, Vol.50 (11), p.4670-4679</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3</citedby><cites>FETCH-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3</cites><orcidid>0000-0003-1412-8841 ; 0000-0003-1589-1443</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8600378$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8600378$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30605117$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Rizvi, Syed Ali Asad</creatorcontrib><creatorcontrib>Lin, Zongli</creatorcontrib><title>Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback</title><title>IEEE transactions on cybernetics</title><addtitle>TCYB</addtitle><addtitle>IEEE Trans Cybern</addtitle><description>In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.</description><subject>Adaptive dynamic programming (ADP)</subject><subject>Algorithms</subject><subject>Continuous time systems</subject><subject>Cost function</subject><subject>Dynamic programming</subject><subject>Feedback control</subject><subject>Iterative methods</subject><subject>Learning</subject><subject>Linear quadratic regulator</subject><subject>linear quadratic regulator (LQR)</subject><subject>Mathematical model</subject><subject>Mathematical models</subject><subject>Optimal control</subject><subject>Optimization</subject><subject>Output feedback</subject><subject>Parameter estimation</subject><subject>Parameterization</subject><subject>reinforcement learning (RL)</subject><subject>Riccati equation</subject><subject>Stability analysis</subject><subject>State feedback</subject><issn>2168-2267</issn><issn>2168-2275</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkUFvGyEQhVHVqomS_ICqUoXUSy_rDOwu4GPjNm0lS1ET55ATwjBEpF7WgeXgf18suz6UC6PH955GPEI-MJgxBvPr1eLpZsaBqRlXSsi2f0POOROq4Vz2b0-zkGfkKucXqEdVaa7ek7MWBPSMyXPyeo8h-jFZHDBOdIkmxRCfmxuT0dFliFWgv4txyUzB0nt8Lps6jZGOni7GOIVYxpKbVRiQPuzyhEOmj7lG0G-7aIbquSvTtkz0FtGtjf1zSd55s8l4dbwvyOPt99XiZ7O8-_Fr8XXZ2LabT42VgjPVdajmjoHsheucUUaiBwWih9ZBi9xXVXprHfoe1t4id9BZp8C3F-TLIXebxteCedJDyBY3GxOxbqzr_7TAOOOyop__Q1_GkmLdTvOuFz2XshOVYgfKpjHnhF5vUxhM2mkGel-J3lei95XoYyXV8-mYXNYDupPjXwEV-HgAAiKenpUAaKVq_wJ-PZB5</recordid><startdate>20201101</startdate><enddate>20201101</enddate><creator>Rizvi, Syed Ali Asad</creator><creator>Lin, Zongli</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1412-8841</orcidid><orcidid>https://orcid.org/0000-0003-1589-1443</orcidid></search><sort><creationdate>20201101</creationdate><title>Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback</title><author>Rizvi, Syed Ali Asad ; Lin, Zongli</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Adaptive dynamic programming (ADP)</topic><topic>Algorithms</topic><topic>Continuous time systems</topic><topic>Cost function</topic><topic>Dynamic programming</topic><topic>Feedback control</topic><topic>Iterative methods</topic><topic>Learning</topic><topic>Linear quadratic regulator</topic><topic>linear quadratic regulator (LQR)</topic><topic>Mathematical model</topic><topic>Mathematical models</topic><topic>Optimal control</topic><topic>Optimization</topic><topic>Output feedback</topic><topic>Parameter estimation</topic><topic>Parameterization</topic><topic>reinforcement learning (RL)</topic><topic>Riccati equation</topic><topic>Stability analysis</topic><topic>State feedback</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rizvi, Syed Ali Asad</creatorcontrib><creatorcontrib>Lin, Zongli</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Rizvi, Syed Ali Asad</au><au>Lin, Zongli</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback</atitle><jtitle>IEEE transactions on cybernetics</jtitle><stitle>TCYB</stitle><addtitle>IEEE Trans Cybern</addtitle><date>2020-11-01</date><risdate>2020</risdate><volume>50</volume><issue>11</issue><spage>4670</spage><epage>4679</epage><pages>4670-4679</pages><issn>2168-2267</issn><eissn>2168-2275</eissn><coden>ITCEB8</coden><abstract>In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>30605117</pmid><doi>10.1109/TCYB.2018.2886735</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-1412-8841</orcidid><orcidid>https://orcid.org/0000-0003-1589-1443</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2168-2267
ispartof IEEE transactions on cybernetics, 2020-11, Vol.50 (11), p.4670-4679
issn 2168-2267
2168-2275
language eng
recordid cdi_ieee_primary_8600378
source IEEE Electronic Library (IEL)
subjects Adaptive dynamic programming (ADP)
Algorithms
Continuous time systems
Cost function
Dynamic programming
Feedback control
Iterative methods
Learning
Linear quadratic regulator
linear quadratic regulator (LQR)
Mathematical model
Mathematical models
Optimal control
Optimization
Output feedback
Parameter estimation
Parameterization
reinforcement learning (RL)
Riccati equation
Stability analysis
State feedback
title Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T03%3A19%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Reinforcement%20Learning-Based%20Linear%20Quadratic%20Regulation%20of%20Continuous-Time%20Systems%20Using%20Dynamic%20Output%20Feedback&rft.jtitle=IEEE%20transactions%20on%20cybernetics&rft.au=Rizvi,%20Syed%20Ali%20Asad&rft.date=2020-11-01&rft.volume=50&rft.issue=11&rft.spage=4670&rft.epage=4679&rft.pages=4670-4679&rft.issn=2168-2267&rft.eissn=2168-2275&rft.coden=ITCEB8&rft_id=info:doi/10.1109/TCYB.2018.2886735&rft_dat=%3Cproquest_RIE%3E2456527746%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2456527746&rft_id=info:pmid/30605117&rft_ieee_id=8600378&rfr_iscdi=true