Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems

This article studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on automatic control 2023-04, Vol.68 (4), p.2383-2390
Hauptverfasser:	Pang, Bo, Jiang, Zhong-Ping
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive control Adaptive optimal control Algorithms data-driven control Heuristic algorithms Least squares Machine learning Optimal control Performance analysis policy iteration Process control Reinforcement learning robustness stochastic control Stochastic processes Stochastic systems
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2390
container_issue	4
container_start_page	2383
container_title	IEEE transactions on automatic control
container_volume	68
creator	Pang, Bo Jiang, Zhong-Ping
description	This article studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed, which is able to find iteratively near-optimal policies of the adaptive optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and effectiveness.
doi_str_mv	10.1109/TAC.2022.3172250
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_journals_2792135156</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>9767669</ieee_id><sourcerecordid>2792135156</sourcerecordid><originalsourceid>FETCH-LOGICAL-c333t-862dc3356f56dbaf0fb83e7b08b513f816c4b0268988d3b917ea19e9b49b80323</originalsourceid><addsrcrecordid>eNo9kEtLAzEUhYMoWKt7wU3A9dQ8JplkWQZfMFCwdeUiJDOJTmmTmkSh_96UFlf3dc693A-AW4xmGCP5sJq3M4IImVHcEMLQGZhgxkRFGKHnYIIQFpUkgl-Cq5TWpeR1jSfg482O3oXY2631GXZWRz_6T1hacD7oXR5_LVyUsNUbuMw6j8HruIdt8DmGDQwOdqMvrjIM_ZdOeezhcp-y3aZrcOH0JtmbU5yC96fHVftSdYvn13beVT2lNFeCk6FkjDvGB6MdckZQ2xgkDMPUCcz72iDChRRioEbixmosrTS1NAJRQqfg_rh3F8P3j01ZrcNP9OWkIo0kmDLMeFGho6qPIaVondrF8lXcK4zUAaEqCNUBoTohLJa7o2W01v7LZcMbziX9A-OIbOQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2792135156</pqid></control><display><type>article</type><title>Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems</title><source>IEEE Electronic Library (IEL)</source><creator>Pang, Bo ; Jiang, Zhong-Ping</creator><creatorcontrib>Pang, Bo ; Jiang, Zhong-Ping</creatorcontrib><description>This article studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed, which is able to find iteratively near-optimal policies of the adaptive optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and effectiveness.</description><identifier>ISSN: 0018-9286</identifier><identifier>EISSN: 1558-2523</identifier><identifier>DOI: 10.1109/TAC.2022.3172250</identifier><identifier>CODEN: IETAA9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Adaptive control ; Adaptive optimal control ; Algorithms ; data-driven control ; Heuristic algorithms ; Least squares ; Machine learning ; Optimal control ; Performance analysis ; policy iteration ; Process control ; Reinforcement learning ; robustness ; stochastic control ; Stochastic processes ; Stochastic systems</subject><ispartof>IEEE transactions on automatic control, 2023-04, Vol.68 (4), p.2383-2390</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2023</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c333t-862dc3356f56dbaf0fb83e7b08b513f816c4b0268988d3b917ea19e9b49b80323</citedby><cites>FETCH-LOGICAL-c333t-862dc3356f56dbaf0fb83e7b08b513f816c4b0268988d3b917ea19e9b49b80323</cites><orcidid>0000-0002-4359-2937 ; 0000-0002-4868-9359</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/9767669$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,780,784,796,27924,27925,54758</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/9767669$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Pang, Bo</creatorcontrib><creatorcontrib>Jiang, Zhong-Ping</creatorcontrib><title>Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems</title><title>IEEE transactions on automatic control</title><addtitle>TAC</addtitle><description>This article studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed, which is able to find iteratively near-optimal policies of the adaptive optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and effectiveness.</description><subject>Adaptive control</subject><subject>Adaptive optimal control</subject><subject>Algorithms</subject><subject>data-driven control</subject><subject>Heuristic algorithms</subject><subject>Least squares</subject><subject>Machine learning</subject><subject>Optimal control</subject><subject>Performance analysis</subject><subject>policy iteration</subject><subject>Process control</subject><subject>Reinforcement learning</subject><subject>robustness</subject><subject>stochastic control</subject><subject>Stochastic processes</subject><subject>Stochastic systems</subject><issn>0018-9286</issn><issn>1558-2523</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kEtLAzEUhYMoWKt7wU3A9dQ8JplkWQZfMFCwdeUiJDOJTmmTmkSh_96UFlf3dc693A-AW4xmGCP5sJq3M4IImVHcEMLQGZhgxkRFGKHnYIIQFpUkgl-Cq5TWpeR1jSfg482O3oXY2631GXZWRz_6T1hacD7oXR5_LVyUsNUbuMw6j8HruIdt8DmGDQwOdqMvrjIM_ZdOeezhcp-y3aZrcOH0JtmbU5yC96fHVftSdYvn13beVT2lNFeCk6FkjDvGB6MdckZQ2xgkDMPUCcz72iDChRRioEbixmosrTS1NAJRQqfg_rh3F8P3j01ZrcNP9OWkIo0kmDLMeFGho6qPIaVondrF8lXcK4zUAaEqCNUBoTohLJa7o2W01v7LZcMbziX9A-OIbOQ</recordid><startdate>20230401</startdate><enddate>20230401</enddate><creator>Pang, Bo</creator><creator>Jiang, Zhong-Ping</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-4359-2937</orcidid><orcidid>https://orcid.org/0000-0002-4868-9359</orcidid></search><sort><creationdate>20230401</creationdate><title>Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems</title><author>Pang, Bo ; Jiang, Zhong-Ping</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c333t-862dc3356f56dbaf0fb83e7b08b513f816c4b0268988d3b917ea19e9b49b80323</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Adaptive control</topic><topic>Adaptive optimal control</topic><topic>Algorithms</topic><topic>data-driven control</topic><topic>Heuristic algorithms</topic><topic>Least squares</topic><topic>Machine learning</topic><topic>Optimal control</topic><topic>Performance analysis</topic><topic>policy iteration</topic><topic>Process control</topic><topic>Reinforcement learning</topic><topic>robustness</topic><topic>stochastic control</topic><topic>Stochastic processes</topic><topic>Stochastic systems</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Pang, Bo</creatorcontrib><creatorcontrib>Jiang, Zhong-Ping</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on automatic control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Pang, Bo</au><au>Jiang, Zhong-Ping</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems</atitle><jtitle>IEEE transactions on automatic control</jtitle><stitle>TAC</stitle><date>2023-04-01</date><risdate>2023</risdate><volume>68</volume><issue>4</issue><spage>2383</spage><epage>2390</epage><pages>2383-2390</pages><issn>0018-9286</issn><eissn>1558-2523</eissn><coden>IETAA9</coden><abstract>This article studies the adaptive optimal stationary control of continuous-time linear stochastic systems with both additive and multiplicative noises, using reinforcement learning techniques. Based on policy iteration, a novel off-policy reinforcement learning algorithm, named optimistic least-squares-based policy iteration, is proposed, which is able to find iteratively near-optimal policies of the adaptive optimal stationary control problem directly from input/state data without explicitly identifying any system matrices, starting from an initial admissible control policy. The solutions given by the proposed optimistic least-squares-based policy iteration are proved to converge to a small neighborhood of the optimal solution with probability one, under mild conditions. The application of the proposed algorithm to a triple inverted pendulum example validates its feasibility and effectiveness.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TAC.2022.3172250</doi><tpages>8</tpages><orcidid>https://orcid.org/0000-0002-4359-2937</orcidid><orcidid>https://orcid.org/0000-0002-4868-9359</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0018-9286
ispartof	IEEE transactions on automatic control, 2023-04, Vol.68 (4), p.2383-2390
issn	0018-9286 1558-2523
language	eng
recordid	cdi_proquest_journals_2792135156
source	IEEE Electronic Library (IEL)
subjects	Adaptive control Adaptive optimal control Algorithms data-driven control Heuristic algorithms Least squares Machine learning Optimal control Performance analysis policy iteration Process control Reinforcement learning robustness stochastic control Stochastic processes Stochastic systems
title	Reinforcement Learning for Adaptive Optimal Stationary Control of Linear Stochastic Systems
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-25T16%3A41%3A44IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Reinforcement%20Learning%20for%20Adaptive%20Optimal%20Stationary%20Control%20of%20Linear%20Stochastic%20Systems&rft.jtitle=IEEE%20transactions%20on%20automatic%20control&rft.au=Pang,%20Bo&rft.date=2023-04-01&rft.volume=68&rft.issue=4&rft.spage=2383&rft.epage=2390&rft.pages=2383-2390&rft.issn=0018-9286&rft.eissn=1558-2523&rft.coden=IETAA9&rft_id=info:doi/10.1109/TAC.2022.3172250&rft_dat=%3Cproquest_RIE%3E2792135156%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2792135156&rft_id=info:pmid/&rft_ieee_id=9767669&rfr_iscdi=true