Revisiting Approximate Dynamic Programming and its Convergence

Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on cybernetics 2014-12, Vol.44 (12), p.2733-2743
1. Verfasser: Heydari, Ali
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 2743
container_issue 12
container_start_page 2733
container_title IEEE transactions on cybernetics
container_volume 44
creator Heydari, Ali
description Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.
doi_str_mv 10.1109/TCYB.2014.2314612
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1627697527</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6815973</ieee_id><sourcerecordid>3503475481</sourcerecordid><originalsourceid>FETCH-LOGICAL-c452t-af6ad31ffc827a3d707e466fd18ad78b06070a34e9920cd824cd51597c2d981c3</originalsourceid><addsrcrecordid>eNqFkd9LwzAQx4Mobsz9ASJIwRdfNnNJm6Qvwqw_YaDIfPApZEk6OtZ2Ju1w_70pm3vwxcCR4-5zx919EToHPAbA6c0s-7wbEwzxmFCIGZAj1CfAxIgQnhwffMZ7aOj9EocnQigVp6hHYhEzJngf3b7bTeGLpqgW0WS9dvV3UarGRvfbSpWFjt5cvXCqLLu8qkxUND7K6mpj3cJW2p6hk1ytvB3u_wH6eHyYZc-j6evTSzaZjnSckGakcqYMhTzXgnBFDcfchgFyA0IZLuaYYY4VjW2aEqyNILE2CSQp18SkAjQdoOtd3zDhV2t9I8vCa7taqcrWrZfAEoiBdvY_SjhLeUJ4QK_-oMu6dVVYpKMY51SACBTsKO1q753N5dqFI7mtBCw7KWQnheykkHspQs3lvnM7L605VPwePgAXO6Cw1h7STHRLU_oDbmeK3Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1626773818</pqid></control><display><type>article</type><title>Revisiting Approximate Dynamic Programming and its Convergence</title><source>IEEE Electronic Library (IEL)</source><creator>Heydari, Ali</creator><creatorcontrib>Heydari, Ali</creatorcontrib><description>Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.</description><identifier>ISSN: 2168-2267</identifier><identifier>EISSN: 2168-2275</identifier><identifier>DOI: 10.1109/TCYB.2014.2314612</identifier><identifier>PMID: 24846687</identifier><identifier>CODEN: ITCEB8</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Approximate dynamic programming ; Approximation ; Approximation methods ; Convergence ; Dynamic programming ; Equations ; Iterative methods ; Learning ; Mathematical analysis ; Mathematical model ; Mathematical models ; nonlinear control systems ; Optimal control ; Optimization ; Policies ; Vectors</subject><ispartof>IEEE transactions on cybernetics, 2014-12, Vol.44 (12), p.2733-2743</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Dec 2014</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c452t-af6ad31ffc827a3d707e466fd18ad78b06070a34e9920cd824cd51597c2d981c3</citedby><cites>FETCH-LOGICAL-c452t-af6ad31ffc827a3d707e466fd18ad78b06070a34e9920cd824cd51597c2d981c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6815973$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6815973$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/24846687$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Heydari, Ali</creatorcontrib><title>Revisiting Approximate Dynamic Programming and its Convergence</title><title>IEEE transactions on cybernetics</title><addtitle>TCYB</addtitle><addtitle>IEEE Trans Cybern</addtitle><description>Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.</description><subject>Approximate dynamic programming</subject><subject>Approximation</subject><subject>Approximation methods</subject><subject>Convergence</subject><subject>Dynamic programming</subject><subject>Equations</subject><subject>Iterative methods</subject><subject>Learning</subject><subject>Mathematical analysis</subject><subject>Mathematical model</subject><subject>Mathematical models</subject><subject>nonlinear control systems</subject><subject>Optimal control</subject><subject>Optimization</subject><subject>Policies</subject><subject>Vectors</subject><issn>2168-2267</issn><issn>2168-2275</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNqFkd9LwzAQx4Mobsz9ASJIwRdfNnNJm6Qvwqw_YaDIfPApZEk6OtZ2Ju1w_70pm3vwxcCR4-5zx919EToHPAbA6c0s-7wbEwzxmFCIGZAj1CfAxIgQnhwffMZ7aOj9EocnQigVp6hHYhEzJngf3b7bTeGLpqgW0WS9dvV3UarGRvfbSpWFjt5cvXCqLLu8qkxUND7K6mpj3cJW2p6hk1ytvB3u_wH6eHyYZc-j6evTSzaZjnSckGakcqYMhTzXgnBFDcfchgFyA0IZLuaYYY4VjW2aEqyNILE2CSQp18SkAjQdoOtd3zDhV2t9I8vCa7taqcrWrZfAEoiBdvY_SjhLeUJ4QK_-oMu6dVVYpKMY51SACBTsKO1q753N5dqFI7mtBCw7KWQnheykkHspQs3lvnM7L605VPwePgAXO6Cw1h7STHRLU_oDbmeK3Q</recordid><startdate>20141201</startdate><enddate>20141201</enddate><creator>Heydari, Ali</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope></search><sort><creationdate>20141201</creationdate><title>Revisiting Approximate Dynamic Programming and its Convergence</title><author>Heydari, Ali</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c452t-af6ad31ffc827a3d707e466fd18ad78b06070a34e9920cd824cd51597c2d981c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Approximate dynamic programming</topic><topic>Approximation</topic><topic>Approximation methods</topic><topic>Convergence</topic><topic>Dynamic programming</topic><topic>Equations</topic><topic>Iterative methods</topic><topic>Learning</topic><topic>Mathematical analysis</topic><topic>Mathematical model</topic><topic>Mathematical models</topic><topic>nonlinear control systems</topic><topic>Optimal control</topic><topic>Optimization</topic><topic>Policies</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Heydari, Ali</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology &amp; Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Heydari, Ali</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Revisiting Approximate Dynamic Programming and its Convergence</atitle><jtitle>IEEE transactions on cybernetics</jtitle><stitle>TCYB</stitle><addtitle>IEEE Trans Cybern</addtitle><date>2014-12-01</date><risdate>2014</risdate><volume>44</volume><issue>12</issue><spage>2733</spage><epage>2743</epage><pages>2733-2743</pages><issn>2168-2267</issn><eissn>2168-2275</eissn><coden>ITCEB8</coden><abstract>Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>24846687</pmid><doi>10.1109/TCYB.2014.2314612</doi><tpages>11</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2168-2267
ispartof IEEE transactions on cybernetics, 2014-12, Vol.44 (12), p.2733-2743
issn 2168-2267
2168-2275
language eng
recordid cdi_proquest_miscellaneous_1627697527
source IEEE Electronic Library (IEL)
subjects Approximate dynamic programming
Approximation
Approximation methods
Convergence
Dynamic programming
Equations
Iterative methods
Learning
Mathematical analysis
Mathematical model
Mathematical models
nonlinear control systems
Optimal control
Optimization
Policies
Vectors
title Revisiting Approximate Dynamic Programming and its Convergence
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T08%3A48%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Revisiting%20Approximate%20Dynamic%20Programming%20and%20its%20Convergence&rft.jtitle=IEEE%20transactions%20on%20cybernetics&rft.au=Heydari,%20Ali&rft.date=2014-12-01&rft.volume=44&rft.issue=12&rft.spage=2733&rft.epage=2743&rft.pages=2733-2743&rft.issn=2168-2267&rft.eissn=2168-2275&rft.coden=ITCEB8&rft_id=info:doi/10.1109/TCYB.2014.2314612&rft_dat=%3Cproquest_RIE%3E3503475481%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1626773818&rft_id=info:pmid/24846687&rft_ieee_id=6815973&rfr_iscdi=true