Revisiting Approximate Dynamic Programming and its Convergence

Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on cybernetics 2014-12, Vol.44 (12), p.2733-2743
1. Verfasser:	Heydari, Ali
Format:	Artikel
Sprache:	eng
Schlagworte:	Approximate dynamic programming Approximation Approximation methods Convergence Dynamic programming Equations Iterative methods Learning Mathematical analysis Mathematical model Mathematical models nonlinear control systems Optimal control Optimization Policies Vectors
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	2743
container_issue	12
container_start_page	2733
container_title	IEEE transactions on cybernetics
container_volume	44
creator	Heydari, Ali
description	Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.
doi_str_mv	10.1109/TCYB.2014.2314612
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1627697527</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6815973</ieee_id><sourcerecordid>3503475481</sourcerecordid><originalsourceid>FETCH-LOGICAL-c452t-af6ad31ffc827a3d707e466fd18ad78b06070a34e9920cd824cd51597c2d981c3</originalsourceid><addsrcrecordid>eNqFkd9LwzAQx4Mobsz9ASJIwRdfNnNJm6Qvwqw_YaDIfPApZEk6OtZ2Ju1w_70pm3vwxcCR4-5zx919EToHPAbA6c0s-7wbEwzxmFCIGZAj1CfAxIgQnhwffMZ7aOj9EocnQigVp6hHYhEzJngf3b7bTeGLpqgW0WS9dvV3UarGRvfbSpWFjt5cvXCqLLu8qkxUND7K6mpj3cJW2p6hk1ytvB3u_wH6eHyYZc-j6evTSzaZjnSckGakcqYMhTzXgnBFDcfchgFyA0IZLuaYYY4VjW2aEqyNILE2CSQp18SkAjQdoOtd3zDhV2t9I8vCa7taqcrWrZfAEoiBdvY_SjhLeUJ4QK_-oMu6dVVYpKMY51SACBTsKO1q753N5dqFI7mtBCw7KWQnheykkHspQs3lvnM7L605VPwePgAXO6Cw1h7STHRLU_oDbmeK3Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1626773818</pqid></control><display><type>article</type><title>Revisiting Approximate Dynamic Programming and its Convergence</title><source>IEEE Electronic Library (IEL)</source><creator>Heydari, Ali</creator><creatorcontrib>Heydari, Ali</creatorcontrib><description>Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.</description><identifier>ISSN: 2168-2267</identifier><identifier>EISSN: 2168-2275</identifier><identifier>DOI: 10.1109/TCYB.2014.2314612</identifier><identifier>PMID: 24846687</identifier><identifier>CODEN: ITCEB8</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Approximate dynamic programming ; Approximation ; Approximation methods ; Convergence ; Dynamic programming ; Equations ; Iterative methods ; Learning ; Mathematical analysis ; Mathematical model ; Mathematical models ; nonlinear control systems ; Optimal control ; Optimization ; Policies ; Vectors</subject><ispartof>IEEE transactions on cybernetics, 2014-12, Vol.44 (12), p.2733-2743</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Dec 2014</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c452t-af6ad31ffc827a3d707e466fd18ad78b06070a34e9920cd824cd51597c2d981c3</citedby><cites>FETCH-LOGICAL-c452t-af6ad31ffc827a3d707e466fd18ad78b06070a34e9920cd824cd51597c2d981c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6815973$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6815973$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/24846687$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Heydari, Ali</creatorcontrib><title>Revisiting Approximate Dynamic Programming and its Convergence</title><title>IEEE transactions on cybernetics</title><addtitle>TCYB</addtitle><addtitle>IEEE Trans Cybern</addtitle><description>Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.</description><subject>Approximate dynamic programming</subject><subject>Approximation</subject><subject>Approximation methods</subject><subject>Convergence</subject><subject>Dynamic programming</subject><subject>Equations</subject><subject>Iterative methods</subject><subject>Learning</subject><subject>Mathematical analysis</subject><subject>Mathematical model</subject><subject>Mathematical models</subject><subject>nonlinear control systems</subject><subject>Optimal control</subject><subject>Optimization</subject><subject>Policies</subject><subject>Vectors</subject><issn>2168-2267</issn><issn>2168-2275</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNqFkd9LwzAQx4Mobsz9ASJIwRdfNnNJm6Qvwqw_YaDIfPApZEk6OtZ2Ju1w_70pm3vwxcCR4-5zx919EToHPAbA6c0s-7wbEwzxmFCIGZAj1CfAxIgQnhwffMZ7aOj9EocnQigVp6hHYhEzJngf3b7bTeGLpqgW0WS9dvV3UarGRvfbSpWFjt5cvXCqLLu8qkxUND7K6mpj3cJW2p6hk1ytvB3u_wH6eHyYZc-j6evTSzaZjnSckGakcqYMhTzXgnBFDcfchgFyA0IZLuaYYY4VjW2aEqyNILE2CSQp18SkAjQdoOtd3zDhV2t9I8vCa7taqcrWrZfAEoiBdvY_SjhLeUJ4QK_-oMu6dVVYpKMY51SACBTsKO1q753N5dqFI7mtBCw7KWQnheykkHspQs3lvnM7L605VPwePgAXO6Cw1h7STHRLU_oDbmeK3Q</recordid><startdate>20141201</startdate><enddate>20141201</enddate><creator>Heydari, Ali</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope></search><sort><creationdate>20141201</creationdate><title>Revisiting Approximate Dynamic Programming and its Convergence</title><author>Heydari, Ali</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c452t-af6ad31ffc827a3d707e466fd18ad78b06070a34e9920cd824cd51597c2d981c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Approximate dynamic programming</topic><topic>Approximation</topic><topic>Approximation methods</topic><topic>Convergence</topic><topic>Dynamic programming</topic><topic>Equations</topic><topic>Iterative methods</topic><topic>Learning</topic><topic>Mathematical analysis</topic><topic>Mathematical model</topic><topic>Mathematical models</topic><topic>nonlinear control systems</topic><topic>Optimal control</topic><topic>Optimization</topic><topic>Policies</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Heydari, Ali</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Heydari, Ali</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Revisiting Approximate Dynamic Programming and its Convergence</atitle><jtitle>IEEE transactions on cybernetics</jtitle><stitle>TCYB</stitle><addtitle>IEEE Trans Cybern</addtitle><date>2014-12-01</date><risdate>2014</risdate><volume>44</volume><issue>12</issue><spage>2733</spage><epage>2743</epage><pages>2733-2743</pages><issn>2168-2267</issn><eissn>2168-2275</eissn><coden>ITCEB8</coden><abstract>Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>24846687</pmid><doi>10.1109/TCYB.2014.2314612</doi><tpages>11</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 2168-2267
ispartof	IEEE transactions on cybernetics, 2014-12, Vol.44 (12), p.2733-2743
issn	2168-2267 2168-2275
language	eng
recordid	cdi_proquest_miscellaneous_1627697527
source	IEEE Electronic Library (IEL)
subjects	Approximate dynamic programming Approximation Approximation methods Convergence Dynamic programming Equations Iterative methods Learning Mathematical analysis Mathematical model Mathematical models nonlinear control systems Optimal control Optimization Policies Vectors
title	Revisiting Approximate Dynamic Programming and its Convergence
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T08%3A48%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Revisiting%20Approximate%20Dynamic%20Programming%20and%20its%20Convergence&rft.jtitle=IEEE%20transactions%20on%20cybernetics&rft.au=Heydari,%20Ali&rft.date=2014-12-01&rft.volume=44&rft.issue=12&rft.spage=2733&rft.epage=2743&rft.pages=2733-2743&rft.issn=2168-2267&rft.eissn=2168-2275&rft.coden=ITCEB8&rft_id=info:doi/10.1109/TCYB.2014.2314612&rft_dat=%3Cproquest_RIE%3E3503475481%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1626773818&rft_id=info:pmid/24846687&rft_ieee_id=6815973&rfr_iscdi=true