Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features
We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its relate...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on automatic control 2017-09, Vol.62 (9), p.4318-4332 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 4332 |
---|---|
container_issue | 9 |
container_start_page | 4318 |
container_title | IEEE transactions on automatic control |
container_volume | 62 |
creator | Cao, Xi-Ren |
description | We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored. |
doi_str_mv | 10.1109/TAC.2017.2655487 |
format | Article |
fullrecord | <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TAC_2017_2655487</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7827147</ieee_id><sourcerecordid>10_1109_TAC_2017_2655487</sourcerecordid><originalsourceid>FETCH-LOGICAL-c263t-333d0971ac18d472bc7e9899070cdffa83ea483766d8d52db6b8d69a0627693f3</originalsourceid><addsrcrecordid>eNo9kE1rAjEQhkNpodb2Xuglf2BtPnbzcZSl1oJUEKXHEDezNkU3kkSL_74rSk_DMPO88D4IPVMyopTo1-W4HjFC5YiJqiqVvEEDWlWqYBXjt2hACFWFZkrco4eUfvpVlCUdIDPfZ7-zW59PuA6d89mHLuE2RDwL3aZYHDo8PkK0G8AL-LXRJfzl8zdedQ5igi002R_PtO0c_uzZXQj9eQI2HyKkR3TX2m2Cp-scotXkbVlPi9n8_aMez4qGCZ4LzrkjWlLbUOVKydaNBK20JpI0rm2t4mBLxaUQTrmKubVYKye0JYJJoXnLh4hccpsYUorQmn3se8WTocScBZlekDkLMldBPfJyQTwA_L9LxSQtJf8DRFJi5Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features</title><source>IEEE Electronic Library (IEL)</source><creator>Cao, Xi-Ren</creator><creatorcontrib>Cao, Xi-Ren</creatorcontrib><description>We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored.</description><identifier>ISSN: 0018-9286</identifier><identifier>EISSN: 1558-2523</identifier><identifier>DOI: 10.1109/TAC.2017.2655487</identifier><identifier>CODEN: IETAA9</identifier><language>eng</language><publisher>IEEE</publisher><subject>Aerospace electronics ; Bias optimality ; direct-comparison-based optimization ; Dynamic programming ; Generators ; Ito–Tanaka formula ; local time ; Markov processes ; Optimization ; relative optimization ; semismooth functions ; state comparability ; theory ; Transient analysis ; Viscosity ; viscosity solution</subject><ispartof>IEEE transactions on automatic control, 2017-09, Vol.62 (9), p.4318-4332</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c263t-333d0971ac18d472bc7e9899070cdffa83ea483766d8d52db6b8d69a0627693f3</citedby><cites>FETCH-LOGICAL-c263t-333d0971ac18d472bc7e9899070cdffa83ea483766d8d52db6b8d69a0627693f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7827147$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27928,27929,54762</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7827147$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Cao, Xi-Ren</creatorcontrib><title>Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features</title><title>IEEE transactions on automatic control</title><addtitle>TAC</addtitle><description>We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored.</description><subject>Aerospace electronics</subject><subject>Bias optimality</subject><subject>direct-comparison-based optimization</subject><subject>Dynamic programming</subject><subject>Generators</subject><subject>Ito–Tanaka formula</subject><subject>local time</subject><subject>Markov processes</subject><subject>Optimization</subject><subject>relative optimization</subject><subject>semismooth functions</subject><subject>state comparability</subject><subject>theory</subject><subject>Transient analysis</subject><subject>Viscosity</subject><subject>viscosity solution</subject><issn>0018-9286</issn><issn>1558-2523</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1rAjEQhkNpodb2Xuglf2BtPnbzcZSl1oJUEKXHEDezNkU3kkSL_74rSk_DMPO88D4IPVMyopTo1-W4HjFC5YiJqiqVvEEDWlWqYBXjt2hACFWFZkrco4eUfvpVlCUdIDPfZ7-zW59PuA6d89mHLuE2RDwL3aZYHDo8PkK0G8AL-LXRJfzl8zdedQ5igi002R_PtO0c_uzZXQj9eQI2HyKkR3TX2m2Cp-scotXkbVlPi9n8_aMez4qGCZ4LzrkjWlLbUOVKydaNBK20JpI0rm2t4mBLxaUQTrmKubVYKye0JYJJoXnLh4hccpsYUorQmn3se8WTocScBZlekDkLMldBPfJyQTwA_L9LxSQtJf8DRFJi5Q</recordid><startdate>201709</startdate><enddate>201709</enddate><creator>Cao, Xi-Ren</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>201709</creationdate><title>Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features</title><author>Cao, Xi-Ren</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c263t-333d0971ac18d472bc7e9899070cdffa83ea483766d8d52db6b8d69a0627693f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Aerospace electronics</topic><topic>Bias optimality</topic><topic>direct-comparison-based optimization</topic><topic>Dynamic programming</topic><topic>Generators</topic><topic>Ito–Tanaka formula</topic><topic>local time</topic><topic>Markov processes</topic><topic>Optimization</topic><topic>relative optimization</topic><topic>semismooth functions</topic><topic>state comparability</topic><topic>theory</topic><topic>Transient analysis</topic><topic>Viscosity</topic><topic>viscosity solution</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cao, Xi-Ren</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on automatic control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cao, Xi-Ren</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features</atitle><jtitle>IEEE transactions on automatic control</jtitle><stitle>TAC</stitle><date>2017-09</date><risdate>2017</risdate><volume>62</volume><issue>9</issue><spage>4318</spage><epage>4332</epage><pages>4318-4332</pages><issn>0018-9286</issn><eissn>1558-2523</eissn><coden>IETAA9</coden><abstract>We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored.</abstract><pub>IEEE</pub><doi>10.1109/TAC.2017.2655487</doi><tpages>15</tpages></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0018-9286 |
ispartof | IEEE transactions on automatic control, 2017-09, Vol.62 (9), p.4318-4332 |
issn | 0018-9286 1558-2523 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TAC_2017_2655487 |
source | IEEE Electronic Library (IEL) |
subjects | Aerospace electronics Bias optimality direct-comparison-based optimization Dynamic programming Generators Ito–Tanaka formula local time Markov processes Optimization relative optimization semismooth functions state comparability theory Transient analysis Viscosity viscosity solution |
title | Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T04%3A31%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimality%20Conditions%20for%20Long-Run%20Average%20Rewards%20With%20Underselectivity%20and%20Nonsmooth%20Features&rft.jtitle=IEEE%20transactions%20on%20automatic%20control&rft.au=Cao,%20Xi-Ren&rft.date=2017-09&rft.volume=62&rft.issue=9&rft.spage=4318&rft.epage=4332&rft.pages=4318-4332&rft.issn=0018-9286&rft.eissn=1558-2523&rft.coden=IETAA9&rft_id=info:doi/10.1109/TAC.2017.2655487&rft_dat=%3Ccrossref_RIE%3E10_1109_TAC_2017_2655487%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=7827147&rfr_iscdi=true |