Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback
In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, w...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on cybernetics 2020-11, Vol.50 (11), p.4670-4679 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 4679 |
---|---|
container_issue | 11 |
container_start_page | 4670 |
container_title | IEEE transactions on cybernetics |
container_volume | 50 |
creator | Rizvi, Syed Ali Asad Lin, Zongli |
description | In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms. |
doi_str_mv | 10.1109/TCYB.2018.2886735 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8600378</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8600378</ieee_id><sourcerecordid>2456527746</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3</originalsourceid><addsrcrecordid>eNpdkUFvGyEQhVHVqomS_ICqUoXUSy_rDOwu4GPjNm0lS1ET55ATwjBEpF7WgeXgf18suz6UC6PH955GPEI-MJgxBvPr1eLpZsaBqRlXSsi2f0POOROq4Vz2b0-zkGfkKucXqEdVaa7ek7MWBPSMyXPyeo8h-jFZHDBOdIkmxRCfmxuT0dFliFWgv4txyUzB0nt8Lps6jZGOni7GOIVYxpKbVRiQPuzyhEOmj7lG0G-7aIbquSvTtkz0FtGtjf1zSd55s8l4dbwvyOPt99XiZ7O8-_Fr8XXZ2LabT42VgjPVdajmjoHsheucUUaiBwWih9ZBi9xXVXprHfoe1t4id9BZp8C3F-TLIXebxteCedJDyBY3GxOxbqzr_7TAOOOyop__Q1_GkmLdTvOuFz2XshOVYgfKpjHnhF5vUxhM2mkGel-J3lei95XoYyXV8-mYXNYDupPjXwEV-HgAAiKenpUAaKVq_wJ-PZB5</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2456527746</pqid></control><display><type>article</type><title>Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback</title><source>IEEE Electronic Library (IEL)</source><creator>Rizvi, Syed Ali Asad ; Lin, Zongli</creator><creatorcontrib>Rizvi, Syed Ali Asad ; Lin, Zongli</creatorcontrib><description>In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.</description><identifier>ISSN: 2168-2267</identifier><identifier>EISSN: 2168-2275</identifier><identifier>DOI: 10.1109/TCYB.2018.2886735</identifier><identifier>PMID: 30605117</identifier><identifier>CODEN: ITCEB8</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Adaptive dynamic programming (ADP) ; Algorithms ; Continuous time systems ; Cost function ; Dynamic programming ; Feedback control ; Iterative methods ; Learning ; Linear quadratic regulator ; linear quadratic regulator (LQR) ; Mathematical model ; Mathematical models ; Optimal control ; Optimization ; Output feedback ; Parameter estimation ; Parameterization ; reinforcement learning (RL) ; Riccati equation ; Stability analysis ; State feedback</subject><ispartof>IEEE transactions on cybernetics, 2020-11, Vol.50 (11), p.4670-4679</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3</citedby><cites>FETCH-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3</cites><orcidid>0000-0003-1412-8841 ; 0000-0003-1589-1443</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8600378$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8600378$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30605117$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Rizvi, Syed Ali Asad</creatorcontrib><creatorcontrib>Lin, Zongli</creatorcontrib><title>Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback</title><title>IEEE transactions on cybernetics</title><addtitle>TCYB</addtitle><addtitle>IEEE Trans Cybern</addtitle><description>In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.</description><subject>Adaptive dynamic programming (ADP)</subject><subject>Algorithms</subject><subject>Continuous time systems</subject><subject>Cost function</subject><subject>Dynamic programming</subject><subject>Feedback control</subject><subject>Iterative methods</subject><subject>Learning</subject><subject>Linear quadratic regulator</subject><subject>linear quadratic regulator (LQR)</subject><subject>Mathematical model</subject><subject>Mathematical models</subject><subject>Optimal control</subject><subject>Optimization</subject><subject>Output feedback</subject><subject>Parameter estimation</subject><subject>Parameterization</subject><subject>reinforcement learning (RL)</subject><subject>Riccati equation</subject><subject>Stability analysis</subject><subject>State feedback</subject><issn>2168-2267</issn><issn>2168-2275</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkUFvGyEQhVHVqomS_ICqUoXUSy_rDOwu4GPjNm0lS1ET55ATwjBEpF7WgeXgf18suz6UC6PH955GPEI-MJgxBvPr1eLpZsaBqRlXSsi2f0POOROq4Vz2b0-zkGfkKucXqEdVaa7ek7MWBPSMyXPyeo8h-jFZHDBOdIkmxRCfmxuT0dFliFWgv4txyUzB0nt8Lps6jZGOni7GOIVYxpKbVRiQPuzyhEOmj7lG0G-7aIbquSvTtkz0FtGtjf1zSd55s8l4dbwvyOPt99XiZ7O8-_Fr8XXZ2LabT42VgjPVdajmjoHsheucUUaiBwWih9ZBi9xXVXprHfoe1t4id9BZp8C3F-TLIXebxteCedJDyBY3GxOxbqzr_7TAOOOyop__Q1_GkmLdTvOuFz2XshOVYgfKpjHnhF5vUxhM2mkGel-J3lei95XoYyXV8-mYXNYDupPjXwEV-HgAAiKenpUAaKVq_wJ-PZB5</recordid><startdate>20201101</startdate><enddate>20201101</enddate><creator>Rizvi, Syed Ali Asad</creator><creator>Lin, Zongli</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1412-8841</orcidid><orcidid>https://orcid.org/0000-0003-1589-1443</orcidid></search><sort><creationdate>20201101</creationdate><title>Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback</title><author>Rizvi, Syed Ali Asad ; Lin, Zongli</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Adaptive dynamic programming (ADP)</topic><topic>Algorithms</topic><topic>Continuous time systems</topic><topic>Cost function</topic><topic>Dynamic programming</topic><topic>Feedback control</topic><topic>Iterative methods</topic><topic>Learning</topic><topic>Linear quadratic regulator</topic><topic>linear quadratic regulator (LQR)</topic><topic>Mathematical model</topic><topic>Mathematical models</topic><topic>Optimal control</topic><topic>Optimization</topic><topic>Output feedback</topic><topic>Parameter estimation</topic><topic>Parameterization</topic><topic>reinforcement learning (RL)</topic><topic>Riccati equation</topic><topic>Stability analysis</topic><topic>State feedback</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rizvi, Syed Ali Asad</creatorcontrib><creatorcontrib>Lin, Zongli</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Rizvi, Syed Ali Asad</au><au>Lin, Zongli</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback</atitle><jtitle>IEEE transactions on cybernetics</jtitle><stitle>TCYB</stitle><addtitle>IEEE Trans Cybern</addtitle><date>2020-11-01</date><risdate>2020</risdate><volume>50</volume><issue>11</issue><spage>4670</spage><epage>4679</epage><pages>4670-4679</pages><issn>2168-2267</issn><eissn>2168-2275</eissn><coden>ITCEB8</coden><abstract>In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>30605117</pmid><doi>10.1109/TCYB.2018.2886735</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-1412-8841</orcidid><orcidid>https://orcid.org/0000-0003-1589-1443</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2168-2267 |
ispartof | IEEE transactions on cybernetics, 2020-11, Vol.50 (11), p.4670-4679 |
issn | 2168-2267 2168-2275 |
language | eng |
recordid | cdi_ieee_primary_8600378 |
source | IEEE Electronic Library (IEL) |
subjects | Adaptive dynamic programming (ADP) Algorithms Continuous time systems Cost function Dynamic programming Feedback control Iterative methods Learning Linear quadratic regulator linear quadratic regulator (LQR) Mathematical model Mathematical models Optimal control Optimization Output feedback Parameter estimation Parameterization reinforcement learning (RL) Riccati equation Stability analysis State feedback |
title | Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T03%3A19%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Reinforcement%20Learning-Based%20Linear%20Quadratic%20Regulation%20of%20Continuous-Time%20Systems%20Using%20Dynamic%20Output%20Feedback&rft.jtitle=IEEE%20transactions%20on%20cybernetics&rft.au=Rizvi,%20Syed%20Ali%20Asad&rft.date=2020-11-01&rft.volume=50&rft.issue=11&rft.spage=4670&rft.epage=4679&rft.pages=4670-4679&rft.issn=2168-2267&rft.eissn=2168-2275&rft.coden=ITCEB8&rft_id=info:doi/10.1109/TCYB.2018.2886735&rft_dat=%3Cproquest_RIE%3E2456527746%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2456527746&rft_id=info:pmid/30605117&rft_ieee_id=8600378&rfr_iscdi=true |