Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics

In this paper, a novel approach based on the Q-learning algorithm is proposed to solve the infinite-horizon linear quadratic tracker (LQT) for unknown discrete-time systems in a causal manner. It is assumed that the reference trajectory is generated by a linear command generator system. An augmented...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Automatica (Oxford) 2014-04, Vol.50 (4), p.1167-1175
Hauptverfasser:	Kiumarsi, Bahare, Lewis, Frank L., Modares, Hamidreza, Karimpour, Ali, Naghibi-Sistani, Mohammad-Bagher
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptative systems Algebraic Riccati equation Applied sciences Artificial intelligence Computer science control theory systems Control theory. Systems Exact sciences and technology Learning and adaptive systems Linear quadratic tracker Optimal control Policy iteration Reinforcement learning
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1175
container_issue	4
container_start_page	1167
container_title	Automatica (Oxford)
container_volume	50
creator	Kiumarsi, Bahare Lewis, Frank L. Modares, Hamidreza Karimpour, Ali Naghibi-Sistani, Mohammad-Bagher
description	In this paper, a novel approach based on the Q-learning algorithm is proposed to solve the infinite-horizon linear quadratic tracker (LQT) for unknown discrete-time systems in a causal manner. It is assumed that the reference trajectory is generated by a linear command generator system. An augmented system composed of the original system and the command generator is constructed and it is shown that the value function for the LQT is quadratic in terms of the state of the augmented system. Using the quadratic structure of the value function, a Bellman equation and an augmented algebraic Riccati equation (ARE) for solving the LQT are derived. In contrast to the standard solution of the LQT, which requires the solution of an ARE and a noncausal difference equation simultaneously, in the proposed method the optimal control input is obtained by only solving an augmented ARE. A Q-learning algorithm is developed to solve online the augmented ARE without any knowledge about the system dynamics or the command generator. Convergence to the optimal solution is shown. A simulation example is used to verify the effectiveness of the proposed control scheme.
doi_str_mv	10.1016/j.automatica.2014.02.015
format	Article
fullrecord	<record><control><sourceid>elsevier_pasca</sourceid><recordid>TN_cdi_pascalfrancis_primary_28387870</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><els_id>S0005109814000533</els_id><sourcerecordid>S0005109814000533</sourcerecordid><originalsourceid>FETCH-LOGICAL-e244t-b73e725a0383c43d45b55810b33355fb46362fade988399d835a76962d860f33</originalsourceid><addsrcrecordid>eNpFkE1LAzEQhoMoWKv_IRePu-Zjs5s9avELBFF6D2kyq2l3k5Kklv57Uyp4Gubl4WXmQQhTUlNC27t1rXc5TDo7o2tGaFMTVhMqztCMyo5XTPL2HM0IIaKipJeX6CqldVkbKtkMbT7B-SFEAxP4jD-qEXT0zn_hEuKwzW7SI85Rm80xNMHnGEYcBjw6X1BsXTIRMlSFBJwOKcOU8N7lb7zzGx_2HtuD15Mz6RpdDHpMcPM352j59LhcvFRv78-vi_u3CljT5GrVceiY0IRLbhpuG7ESQlKy4pwLMayalrds0BZ6KXnfW8mF7tq-ZVa2ZOB8jm5PtVudjB6HqL1xSW1jeSUeVBEiO9mRwj2cOCi3_DiIKhkH3oB1EUxWNjhFiTpaVmv1b1kdLSvCVLHMfwFz5HZa</addsrcrecordid><sourcetype>Index Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics</title><source>Elsevier ScienceDirect Journals Complete</source><creator>Kiumarsi, Bahare ; Lewis, Frank L. ; Modares, Hamidreza ; Karimpour, Ali ; Naghibi-Sistani, Mohammad-Bagher</creator><creatorcontrib>Kiumarsi, Bahare ; Lewis, Frank L. ; Modares, Hamidreza ; Karimpour, Ali ; Naghibi-Sistani, Mohammad-Bagher</creatorcontrib><description>In this paper, a novel approach based on the Q-learning algorithm is proposed to solve the infinite-horizon linear quadratic tracker (LQT) for unknown discrete-time systems in a causal manner. It is assumed that the reference trajectory is generated by a linear command generator system. An augmented system composed of the original system and the command generator is constructed and it is shown that the value function for the LQT is quadratic in terms of the state of the augmented system. Using the quadratic structure of the value function, a Bellman equation and an augmented algebraic Riccati equation (ARE) for solving the LQT are derived. In contrast to the standard solution of the LQT, which requires the solution of an ARE and a noncausal difference equation simultaneously, in the proposed method the optimal control input is obtained by only solving an augmented ARE. A Q-learning algorithm is developed to solve online the augmented ARE without any knowledge about the system dynamics or the command generator. Convergence to the optimal solution is shown. A simulation example is used to verify the effectiveness of the proposed control scheme.</description><identifier>ISSN: 0005-1098</identifier><identifier>EISSN: 1873-2836</identifier><identifier>DOI: 10.1016/j.automatica.2014.02.015</identifier><identifier>CODEN: ATCAA9</identifier><language>eng</language><publisher>Kidlington: Elsevier Ltd</publisher><subject>Adaptative systems ; Algebraic Riccati equation ; Applied sciences ; Artificial intelligence ; Computer science; control theory; systems ; Control theory. Systems ; Exact sciences and technology ; Learning and adaptive systems ; Linear quadratic tracker ; Optimal control ; Policy iteration ; Reinforcement learning</subject><ispartof>Automatica (Oxford), 2014-04, Vol.50 (4), p.1167-1175</ispartof><rights>2014 Elsevier Ltd</rights><rights>2015 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://dx.doi.org/10.1016/j.automatica.2014.02.015$$EHTML$$P50$$Gelsevier$$H</linktohtml><link.rule.ids>314,780,784,3550,27924,27925,45995</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&idt=28387870$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Kiumarsi, Bahare</creatorcontrib><creatorcontrib>Lewis, Frank L.</creatorcontrib><creatorcontrib>Modares, Hamidreza</creatorcontrib><creatorcontrib>Karimpour, Ali</creatorcontrib><creatorcontrib>Naghibi-Sistani, Mohammad-Bagher</creatorcontrib><title>Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics</title><title>Automatica (Oxford)</title><description>In this paper, a novel approach based on the Q-learning algorithm is proposed to solve the infinite-horizon linear quadratic tracker (LQT) for unknown discrete-time systems in a causal manner. It is assumed that the reference trajectory is generated by a linear command generator system. An augmented system composed of the original system and the command generator is constructed and it is shown that the value function for the LQT is quadratic in terms of the state of the augmented system. Using the quadratic structure of the value function, a Bellman equation and an augmented algebraic Riccati equation (ARE) for solving the LQT are derived. In contrast to the standard solution of the LQT, which requires the solution of an ARE and a noncausal difference equation simultaneously, in the proposed method the optimal control input is obtained by only solving an augmented ARE. A Q-learning algorithm is developed to solve online the augmented ARE without any knowledge about the system dynamics or the command generator. Convergence to the optimal solution is shown. A simulation example is used to verify the effectiveness of the proposed control scheme.</description><subject>Adaptative systems</subject><subject>Algebraic Riccati equation</subject><subject>Applied sciences</subject><subject>Artificial intelligence</subject><subject>Computer science; control theory; systems</subject><subject>Control theory. Systems</subject><subject>Exact sciences and technology</subject><subject>Learning and adaptive systems</subject><subject>Linear quadratic tracker</subject><subject>Optimal control</subject><subject>Policy iteration</subject><subject>Reinforcement learning</subject><issn>0005-1098</issn><issn>1873-2836</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><recordid>eNpFkE1LAzEQhoMoWKv_IRePu-Zjs5s9avELBFF6D2kyq2l3k5Kklv57Uyp4Gubl4WXmQQhTUlNC27t1rXc5TDo7o2tGaFMTVhMqztCMyo5XTPL2HM0IIaKipJeX6CqldVkbKtkMbT7B-SFEAxP4jD-qEXT0zn_hEuKwzW7SI85Rm80xNMHnGEYcBjw6X1BsXTIRMlSFBJwOKcOU8N7lb7zzGx_2HtuD15Mz6RpdDHpMcPM352j59LhcvFRv78-vi_u3CljT5GrVceiY0IRLbhpuG7ESQlKy4pwLMayalrds0BZ6KXnfW8mF7tq-ZVa2ZOB8jm5PtVudjB6HqL1xSW1jeSUeVBEiO9mRwj2cOCi3_DiIKhkH3oB1EUxWNjhFiTpaVmv1b1kdLSvCVLHMfwFz5HZa</recordid><startdate>20140401</startdate><enddate>20140401</enddate><creator>Kiumarsi, Bahare</creator><creator>Lewis, Frank L.</creator><creator>Modares, Hamidreza</creator><creator>Karimpour, Ali</creator><creator>Naghibi-Sistani, Mohammad-Bagher</creator><general>Elsevier Ltd</general><general>Elsevier</general><scope>IQODW</scope></search><sort><creationdate>20140401</creationdate><title>Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics</title><author>Kiumarsi, Bahare ; Lewis, Frank L. ; Modares, Hamidreza ; Karimpour, Ali ; Naghibi-Sistani, Mohammad-Bagher</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-e244t-b73e725a0383c43d45b55810b33355fb46362fade988399d835a76962d860f33</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Adaptative systems</topic><topic>Algebraic Riccati equation</topic><topic>Applied sciences</topic><topic>Artificial intelligence</topic><topic>Computer science; control theory; systems</topic><topic>Control theory. Systems</topic><topic>Exact sciences and technology</topic><topic>Learning and adaptive systems</topic><topic>Linear quadratic tracker</topic><topic>Optimal control</topic><topic>Policy iteration</topic><topic>Reinforcement learning</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Kiumarsi, Bahare</creatorcontrib><creatorcontrib>Lewis, Frank L.</creatorcontrib><creatorcontrib>Modares, Hamidreza</creatorcontrib><creatorcontrib>Karimpour, Ali</creatorcontrib><creatorcontrib>Naghibi-Sistani, Mohammad-Bagher</creatorcontrib><collection>Pascal-Francis</collection><jtitle>Automatica (Oxford)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Kiumarsi, Bahare</au><au>Lewis, Frank L.</au><au>Modares, Hamidreza</au><au>Karimpour, Ali</au><au>Naghibi-Sistani, Mohammad-Bagher</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics</atitle><jtitle>Automatica (Oxford)</jtitle><date>2014-04-01</date><risdate>2014</risdate><volume>50</volume><issue>4</issue><spage>1167</spage><epage>1175</epage><pages>1167-1175</pages><issn>0005-1098</issn><eissn>1873-2836</eissn><coden>ATCAA9</coden><abstract>In this paper, a novel approach based on the Q-learning algorithm is proposed to solve the infinite-horizon linear quadratic tracker (LQT) for unknown discrete-time systems in a causal manner. It is assumed that the reference trajectory is generated by a linear command generator system. An augmented system composed of the original system and the command generator is constructed and it is shown that the value function for the LQT is quadratic in terms of the state of the augmented system. Using the quadratic structure of the value function, a Bellman equation and an augmented algebraic Riccati equation (ARE) for solving the LQT are derived. In contrast to the standard solution of the LQT, which requires the solution of an ARE and a noncausal difference equation simultaneously, in the proposed method the optimal control input is obtained by only solving an augmented ARE. A Q-learning algorithm is developed to solve online the augmented ARE without any knowledge about the system dynamics or the command generator. Convergence to the optimal solution is shown. A simulation example is used to verify the effectiveness of the proposed control scheme.</abstract><cop>Kidlington</cop><pub>Elsevier Ltd</pub><doi>10.1016/j.automatica.2014.02.015</doi><tpages>9</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0005-1098
ispartof	Automatica (Oxford), 2014-04, Vol.50 (4), p.1167-1175
issn	0005-1098 1873-2836
language	eng
recordid	cdi_pascalfrancis_primary_28387870
source	Elsevier ScienceDirect Journals Complete
subjects	Adaptative systems Algebraic Riccati equation Applied sciences Artificial intelligence Computer science control theory systems Control theory. Systems Exact sciences and technology Learning and adaptive systems Linear quadratic tracker Optimal control Policy iteration Reinforcement learning
title	Reinforcement Q-learning for optimal tracking control of linear discrete-time systems with unknown dynamics
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-05T03%3A54%3A24IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-elsevier_pasca&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Reinforcement%20Q-learning%20for%20optimal%20tracking%20control%20of%20linear%20discrete-time%20systems%20with%20unknown%20dynamics&rft.jtitle=Automatica%20(Oxford)&rft.au=Kiumarsi,%20Bahare&rft.date=2014-04-01&rft.volume=50&rft.issue=4&rft.spage=1167&rft.epage=1175&rft.pages=1167-1175&rft.issn=0005-1098&rft.eissn=1873-2836&rft.coden=ATCAA9&rft_id=info:doi/10.1016/j.automatica.2014.02.015&rft_dat=%3Celsevier_pasca%3ES0005109814000533%3C/elsevier_pasca%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_els_id=S0005109814000533&rfr_iscdi=true