Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback

In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, w...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on cybernetics 2020-11, Vol.50 (11), p.4670-4679
Hauptverfasser:	Rizvi, Syed Ali Asad, Lin, Zongli
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive dynamic programming (ADP) Algorithms Continuous time systems Cost function Dynamic programming Feedback control Iterative methods Learning Linear quadratic regulator linear quadratic regulator (LQR) Mathematical model Mathematical models Optimal control Optimization Output feedback Parameter estimation Parameterization reinforcement learning (RL) Riccati equation Stability analysis State feedback
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	4679
container_issue	11
container_start_page	4670
container_title	IEEE transactions on cybernetics
container_volume	50
creator	Rizvi, Syed Ali Asad Lin, Zongli
description	In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.
doi_str_mv	10.1109/TCYB.2018.2886735
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_8600378</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8600378</ieee_id><sourcerecordid>2456527746</sourcerecordid><originalsourceid>FETCH-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3</originalsourceid><addsrcrecordid>eNpdkUFvGyEQhVHVqomS_ICqUoXUSy_rDOwu4GPjNm0lS1ET55ATwjBEpF7WgeXgf18suz6UC6PH955GPEI-MJgxBvPr1eLpZsaBqRlXSsi2f0POOROq4Vz2b0-zkGfkKucXqEdVaa7ek7MWBPSMyXPyeo8h-jFZHDBOdIkmxRCfmxuT0dFliFWgv4txyUzB0nt8Lps6jZGOni7GOIVYxpKbVRiQPuzyhEOmj7lG0G-7aIbquSvTtkz0FtGtjf1zSd55s8l4dbwvyOPt99XiZ7O8-_Fr8XXZ2LabT42VgjPVdajmjoHsheucUUaiBwWih9ZBi9xXVXprHfoe1t4id9BZp8C3F-TLIXebxteCedJDyBY3GxOxbqzr_7TAOOOyop__Q1_GkmLdTvOuFz2XshOVYgfKpjHnhF5vUxhM2mkGel-J3lei95XoYyXV8-mYXNYDupPjXwEV-HgAAiKenpUAaKVq_wJ-PZB5</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2456527746</pqid></control><display><type>article</type><title>Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback</title><source>IEEE Electronic Library (IEL)</source><creator>Rizvi, Syed Ali Asad ; Lin, Zongli</creator><creatorcontrib>Rizvi, Syed Ali Asad ; Lin, Zongli</creatorcontrib><description>In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.</description><identifier>ISSN: 2168-2267</identifier><identifier>EISSN: 2168-2275</identifier><identifier>DOI: 10.1109/TCYB.2018.2886735</identifier><identifier>PMID: 30605117</identifier><identifier>CODEN: ITCEB8</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Adaptive dynamic programming (ADP) ; Algorithms ; Continuous time systems ; Cost function ; Dynamic programming ; Feedback control ; Iterative methods ; Learning ; Linear quadratic regulator ; linear quadratic regulator (LQR) ; Mathematical model ; Mathematical models ; Optimal control ; Optimization ; Output feedback ; Parameter estimation ; Parameterization ; reinforcement learning (RL) ; Riccati equation ; Stability analysis ; State feedback</subject><ispartof>IEEE transactions on cybernetics, 2020-11, Vol.50 (11), p.4670-4679</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3</citedby><cites>FETCH-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3</cites><orcidid>0000-0003-1412-8841 ; 0000-0003-1589-1443</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8600378$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8600378$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/30605117$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Rizvi, Syed Ali Asad</creatorcontrib><creatorcontrib>Lin, Zongli</creatorcontrib><title>Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback</title><title>IEEE transactions on cybernetics</title><addtitle>TCYB</addtitle><addtitle>IEEE Trans Cybern</addtitle><description>In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.</description><subject>Adaptive dynamic programming (ADP)</subject><subject>Algorithms</subject><subject>Continuous time systems</subject><subject>Cost function</subject><subject>Dynamic programming</subject><subject>Feedback control</subject><subject>Iterative methods</subject><subject>Learning</subject><subject>Linear quadratic regulator</subject><subject>linear quadratic regulator (LQR)</subject><subject>Mathematical model</subject><subject>Mathematical models</subject><subject>Optimal control</subject><subject>Optimization</subject><subject>Output feedback</subject><subject>Parameter estimation</subject><subject>Parameterization</subject><subject>reinforcement learning (RL)</subject><subject>Riccati equation</subject><subject>Stability analysis</subject><subject>State feedback</subject><issn>2168-2267</issn><issn>2168-2275</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpdkUFvGyEQhVHVqomS_ICqUoXUSy_rDOwu4GPjNm0lS1ET55ATwjBEpF7WgeXgf18suz6UC6PH955GPEI-MJgxBvPr1eLpZsaBqRlXSsi2f0POOROq4Vz2b0-zkGfkKucXqEdVaa7ek7MWBPSMyXPyeo8h-jFZHDBOdIkmxRCfmxuT0dFliFWgv4txyUzB0nt8Lps6jZGOni7GOIVYxpKbVRiQPuzyhEOmj7lG0G-7aIbquSvTtkz0FtGtjf1zSd55s8l4dbwvyOPt99XiZ7O8-_Fr8XXZ2LabT42VgjPVdajmjoHsheucUUaiBwWih9ZBi9xXVXprHfoe1t4id9BZp8C3F-TLIXebxteCedJDyBY3GxOxbqzr_7TAOOOyop__Q1_GkmLdTvOuFz2XshOVYgfKpjHnhF5vUxhM2mkGel-J3lei95XoYyXV8-mYXNYDupPjXwEV-HgAAiKenpUAaKVq_wJ-PZB5</recordid><startdate>20201101</startdate><enddate>20201101</enddate><creator>Rizvi, Syed Ali Asad</creator><creator>Lin, Zongli</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0003-1412-8841</orcidid><orcidid>https://orcid.org/0000-0003-1589-1443</orcidid></search><sort><creationdate>20201101</creationdate><title>Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback</title><author>Rizvi, Syed Ali Asad ; Lin, Zongli</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c349t-c7621844e89d10756d4da8a7ef0806503d03e2fd4d7fccdef50bfce2d04cd80f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Adaptive dynamic programming (ADP)</topic><topic>Algorithms</topic><topic>Continuous time systems</topic><topic>Cost function</topic><topic>Dynamic programming</topic><topic>Feedback control</topic><topic>Iterative methods</topic><topic>Learning</topic><topic>Linear quadratic regulator</topic><topic>linear quadratic regulator (LQR)</topic><topic>Mathematical model</topic><topic>Mathematical models</topic><topic>Optimal control</topic><topic>Optimization</topic><topic>Output feedback</topic><topic>Parameter estimation</topic><topic>Parameterization</topic><topic>reinforcement learning (RL)</topic><topic>Riccati equation</topic><topic>Stability analysis</topic><topic>State feedback</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Rizvi, Syed Ali Asad</creatorcontrib><creatorcontrib>Lin, Zongli</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Rizvi, Syed Ali Asad</au><au>Lin, Zongli</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback</atitle><jtitle>IEEE transactions on cybernetics</jtitle><stitle>TCYB</stitle><addtitle>IEEE Trans Cybern</addtitle><date>2020-11-01</date><risdate>2020</risdate><volume>50</volume><issue>11</issue><spage>4670</spage><epage>4679</epage><pages>4670-4679</pages><issn>2168-2267</issn><eissn>2168-2275</eissn><coden>ITCEB8</coden><abstract>In this paper, we propose a model-free solution to the linear quadratic regulation (LQR) problem of continuous-time systems based on reinforcement learning using dynamic output feedback. The design objective is to learn the optimal control parameters by using only the measurable input-output data, without requiring model information. A state parametrization scheme is presented which reconstructs the system state based on the filtered input and output signals. Based on this parametrization, two new output feedback adaptive dynamic programming Bellman equations are derived for the LQR problem based on policy iteration and value iteration (VI). Unlike the existing output feedback methods for continuous-time systems, the need to apply discrete approximation is obviated. In contrast with the static output feedback controllers, the proposed method can also handle systems that are state feedback stabilizable but not static output feedback stabilizable. An advantage of this scheme is that it stands immune to the exploration bias issue. Moreover, it does not require a discounted cost function and, thus, ensures the closed-loop stability and the optimality of the solution. Compared with earlier output feedback results, the proposed VI method does not require an initially stabilizing policy. We show that the estimates of the control parameters converge to those obtained by solving the LQR algebraic Riccati equation. A comprehensive simulation study is carried out to verify the proposed algorithms.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>30605117</pmid><doi>10.1109/TCYB.2018.2886735</doi><tpages>10</tpages><orcidid>https://orcid.org/0000-0003-1412-8841</orcidid><orcidid>https://orcid.org/0000-0003-1589-1443</orcidid></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2168-2267
ispartof	IEEE transactions on cybernetics, 2020-11, Vol.50 (11), p.4670-4679
issn	2168-2267 2168-2275
language	eng
recordid	cdi_ieee_primary_8600378
source	IEEE Electronic Library (IEL)
subjects	Adaptive dynamic programming (ADP) Algorithms Continuous time systems Cost function Dynamic programming Feedback control Iterative methods Learning Linear quadratic regulator linear quadratic regulator (LQR) Mathematical model Mathematical models Optimal control Optimization Output feedback Parameter estimation Parameterization reinforcement learning (RL) Riccati equation Stability analysis State feedback
title	Reinforcement Learning-Based Linear Quadratic Regulation of Continuous-Time Systems Using Dynamic Output Feedback
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-04T03%3A19%3A38IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Reinforcement%20Learning-Based%20Linear%20Quadratic%20Regulation%20of%20Continuous-Time%20Systems%20Using%20Dynamic%20Output%20Feedback&rft.jtitle=IEEE%20transactions%20on%20cybernetics&rft.au=Rizvi,%20Syed%20Ali%20Asad&rft.date=2020-11-01&rft.volume=50&rft.issue=11&rft.spage=4670&rft.epage=4679&rft.pages=4670-4679&rft.issn=2168-2267&rft.eissn=2168-2275&rft.coden=ITCEB8&rft_id=info:doi/10.1109/TCYB.2018.2886735&rft_dat=%3Cproquest_RIE%3E2456527746%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2456527746&rft_id=info:pmid/30605117&rft_ieee_id=8600378&rfr_iscdi=true