Revisiting Approximate Dynamic Programming and its Convergence
Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on cybernetics 2014-12, Vol.44 (12), p.2733-2743 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 2743 |
---|---|
container_issue | 12 |
container_start_page | 2733 |
container_title | IEEE transactions on cybernetics |
container_volume | 44 |
creator | Heydari, Ali |
description | Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed. |
doi_str_mv | 10.1109/TCYB.2014.2314612 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_proquest_miscellaneous_1627697527</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>6815973</ieee_id><sourcerecordid>3503475481</sourcerecordid><originalsourceid>FETCH-LOGICAL-c452t-af6ad31ffc827a3d707e466fd18ad78b06070a34e9920cd824cd51597c2d981c3</originalsourceid><addsrcrecordid>eNqFkd9LwzAQx4Mobsz9ASJIwRdfNnNJm6Qvwqw_YaDIfPApZEk6OtZ2Ju1w_70pm3vwxcCR4-5zx919EToHPAbA6c0s-7wbEwzxmFCIGZAj1CfAxIgQnhwffMZ7aOj9EocnQigVp6hHYhEzJngf3b7bTeGLpqgW0WS9dvV3UarGRvfbSpWFjt5cvXCqLLu8qkxUND7K6mpj3cJW2p6hk1ytvB3u_wH6eHyYZc-j6evTSzaZjnSckGakcqYMhTzXgnBFDcfchgFyA0IZLuaYYY4VjW2aEqyNILE2CSQp18SkAjQdoOtd3zDhV2t9I8vCa7taqcrWrZfAEoiBdvY_SjhLeUJ4QK_-oMu6dVVYpKMY51SACBTsKO1q753N5dqFI7mtBCw7KWQnheykkHspQs3lvnM7L605VPwePgAXO6Cw1h7STHRLU_oDbmeK3Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>1626773818</pqid></control><display><type>article</type><title>Revisiting Approximate Dynamic Programming and its Convergence</title><source>IEEE Electronic Library (IEL)</source><creator>Heydari, Ali</creator><creatorcontrib>Heydari, Ali</creatorcontrib><description>Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.</description><identifier>ISSN: 2168-2267</identifier><identifier>EISSN: 2168-2275</identifier><identifier>DOI: 10.1109/TCYB.2014.2314612</identifier><identifier>PMID: 24846687</identifier><identifier>CODEN: ITCEB8</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Approximate dynamic programming ; Approximation ; Approximation methods ; Convergence ; Dynamic programming ; Equations ; Iterative methods ; Learning ; Mathematical analysis ; Mathematical model ; Mathematical models ; nonlinear control systems ; Optimal control ; Optimization ; Policies ; Vectors</subject><ispartof>IEEE transactions on cybernetics, 2014-12, Vol.44 (12), p.2733-2743</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) Dec 2014</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c452t-af6ad31ffc827a3d707e466fd18ad78b06070a34e9920cd824cd51597c2d981c3</citedby><cites>FETCH-LOGICAL-c452t-af6ad31ffc827a3d707e466fd18ad78b06070a34e9920cd824cd51597c2d981c3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/6815973$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/6815973$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/24846687$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Heydari, Ali</creatorcontrib><title>Revisiting Approximate Dynamic Programming and its Convergence</title><title>IEEE transactions on cybernetics</title><addtitle>TCYB</addtitle><addtitle>IEEE Trans Cybern</addtitle><description>Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.</description><subject>Approximate dynamic programming</subject><subject>Approximation</subject><subject>Approximation methods</subject><subject>Convergence</subject><subject>Dynamic programming</subject><subject>Equations</subject><subject>Iterative methods</subject><subject>Learning</subject><subject>Mathematical analysis</subject><subject>Mathematical model</subject><subject>Mathematical models</subject><subject>nonlinear control systems</subject><subject>Optimal control</subject><subject>Optimization</subject><subject>Policies</subject><subject>Vectors</subject><issn>2168-2267</issn><issn>2168-2275</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2014</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNqFkd9LwzAQx4Mobsz9ASJIwRdfNnNJm6Qvwqw_YaDIfPApZEk6OtZ2Ju1w_70pm3vwxcCR4-5zx919EToHPAbA6c0s-7wbEwzxmFCIGZAj1CfAxIgQnhwffMZ7aOj9EocnQigVp6hHYhEzJngf3b7bTeGLpqgW0WS9dvV3UarGRvfbSpWFjt5cvXCqLLu8qkxUND7K6mpj3cJW2p6hk1ytvB3u_wH6eHyYZc-j6evTSzaZjnSckGakcqYMhTzXgnBFDcfchgFyA0IZLuaYYY4VjW2aEqyNILE2CSQp18SkAjQdoOtd3zDhV2t9I8vCa7taqcrWrZfAEoiBdvY_SjhLeUJ4QK_-oMu6dVVYpKMY51SACBTsKO1q753N5dqFI7mtBCw7KWQnheykkHspQs3lvnM7L605VPwePgAXO6Cw1h7STHRLU_oDbmeK3Q</recordid><startdate>20141201</startdate><enddate>20141201</enddate><creator>Heydari, Ali</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>F28</scope><scope>FR3</scope><scope>H8D</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>7X8</scope></search><sort><creationdate>20141201</creationdate><title>Revisiting Approximate Dynamic Programming and its Convergence</title><author>Heydari, Ali</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c452t-af6ad31ffc827a3d707e466fd18ad78b06070a34e9920cd824cd51597c2d981c3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2014</creationdate><topic>Approximate dynamic programming</topic><topic>Approximation</topic><topic>Approximation methods</topic><topic>Convergence</topic><topic>Dynamic programming</topic><topic>Equations</topic><topic>Iterative methods</topic><topic>Learning</topic><topic>Mathematical analysis</topic><topic>Mathematical model</topic><topic>Mathematical models</topic><topic>nonlinear control systems</topic><topic>Optimal control</topic><topic>Optimization</topic><topic>Policies</topic><topic>Vectors</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Heydari, Ali</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>ANTE: Abstracts in New Technology & Engineering</collection><collection>Engineering Research Database</collection><collection>Aerospace Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transactions on cybernetics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Heydari, Ali</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Revisiting Approximate Dynamic Programming and its Convergence</atitle><jtitle>IEEE transactions on cybernetics</jtitle><stitle>TCYB</stitle><addtitle>IEEE Trans Cybern</addtitle><date>2014-12-01</date><risdate>2014</risdate><volume>44</volume><issue>12</issue><spage>2733</spage><epage>2743</epage><pages>2733-2743</pages><issn>2168-2267</issn><eissn>2168-2275</eissn><coden>ITCEB8</coden><abstract>Value iteration-based approximate/adaptive dynamic programming (ADP) as an approximate solution to infinite-horizon optimal control problems with deterministic dynamics and continuous state and action spaces is investigated. The learning iterations are decomposed into an outer loop and an inner loop. A relatively simple proof for the convergence of the outer-loop iterations to the optimal solution is provided using a novel idea with some new features. It presents an analogy between the value function during the iterations and the value function of a fixed-final-time optimal control problem. The inner loop is utilized to avoid the need for solving a set of nonlinear equations or a nonlinear optimization problem numerically, at each iteration of ADP for the policy update. Sufficient conditions for the uniqueness of the solution to the policy update equation and for the convergence of the inner-loop iterations to the solution are obtained. Afterwards, the results are formed as a learning algorithm for training a neurocontroller or creating a look-up table to be used for optimal control of nonlinear systems with different initial conditions. Finally, some of the features of the investigated method are numerically analyzed.</abstract><cop>United States</cop><pub>IEEE</pub><pmid>24846687</pmid><doi>10.1109/TCYB.2014.2314612</doi><tpages>11</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2168-2267 |
ispartof | IEEE transactions on cybernetics, 2014-12, Vol.44 (12), p.2733-2743 |
issn | 2168-2267 2168-2275 |
language | eng |
recordid | cdi_proquest_miscellaneous_1627697527 |
source | IEEE Electronic Library (IEL) |
subjects | Approximate dynamic programming Approximation Approximation methods Convergence Dynamic programming Equations Iterative methods Learning Mathematical analysis Mathematical model Mathematical models nonlinear control systems Optimal control Optimization Policies Vectors |
title | Revisiting Approximate Dynamic Programming and its Convergence |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T08%3A48%3A31IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Revisiting%20Approximate%20Dynamic%20Programming%20and%20its%20Convergence&rft.jtitle=IEEE%20transactions%20on%20cybernetics&rft.au=Heydari,%20Ali&rft.date=2014-12-01&rft.volume=44&rft.issue=12&rft.spage=2733&rft.epage=2743&rft.pages=2733-2743&rft.issn=2168-2267&rft.eissn=2168-2275&rft.coden=ITCEB8&rft_id=info:doi/10.1109/TCYB.2014.2314612&rft_dat=%3Cproquest_RIE%3E3503475481%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=1626773818&rft_id=info:pmid/24846687&rft_ieee_id=6815973&rfr_iscdi=true |