Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features

We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its relate...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on automatic control 2017-09, Vol.62 (9), p.4318-4332
1. Verfasser: Cao, Xi-Ren
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 4332
container_issue 9
container_start_page 4318
container_title IEEE transactions on automatic control
container_volume 62
creator Cao, Xi-Ren
description We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored.
doi_str_mv 10.1109/TAC.2017.2655487
format Article
fullrecord <record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TAC_2017_2655487</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7827147</ieee_id><sourcerecordid>10_1109_TAC_2017_2655487</sourcerecordid><originalsourceid>FETCH-LOGICAL-c263t-333d0971ac18d472bc7e9899070cdffa83ea483766d8d52db6b8d69a0627693f3</originalsourceid><addsrcrecordid>eNo9kE1rAjEQhkNpodb2Xuglf2BtPnbzcZSl1oJUEKXHEDezNkU3kkSL_74rSk_DMPO88D4IPVMyopTo1-W4HjFC5YiJqiqVvEEDWlWqYBXjt2hACFWFZkrco4eUfvpVlCUdIDPfZ7-zW59PuA6d89mHLuE2RDwL3aZYHDo8PkK0G8AL-LXRJfzl8zdedQ5igi002R_PtO0c_uzZXQj9eQI2HyKkR3TX2m2Cp-scotXkbVlPi9n8_aMez4qGCZ4LzrkjWlLbUOVKydaNBK20JpI0rm2t4mBLxaUQTrmKubVYKye0JYJJoXnLh4hccpsYUorQmn3se8WTocScBZlekDkLMldBPfJyQTwA_L9LxSQtJf8DRFJi5Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features</title><source>IEEE Electronic Library (IEL)</source><creator>Cao, Xi-Ren</creator><creatorcontrib>Cao, Xi-Ren</creatorcontrib><description>We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored.</description><identifier>ISSN: 0018-9286</identifier><identifier>EISSN: 1558-2523</identifier><identifier>DOI: 10.1109/TAC.2017.2655487</identifier><identifier>CODEN: IETAA9</identifier><language>eng</language><publisher>IEEE</publisher><subject>Aerospace electronics ; Bias optimality ; direct-comparison-based optimization ; Dynamic programming ; Generators ; Ito–Tanaka formula ; local time ; Markov processes ; Optimization ; relative optimization ; semismooth functions ; state comparability ; theory ; Transient analysis ; Viscosity ; viscosity solution</subject><ispartof>IEEE transactions on automatic control, 2017-09, Vol.62 (9), p.4318-4332</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c263t-333d0971ac18d472bc7e9899070cdffa83ea483766d8d52db6b8d69a0627693f3</citedby><cites>FETCH-LOGICAL-c263t-333d0971ac18d472bc7e9899070cdffa83ea483766d8d52db6b8d69a0627693f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7827147$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27928,27929,54762</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7827147$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Cao, Xi-Ren</creatorcontrib><title>Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features</title><title>IEEE transactions on automatic control</title><addtitle>TAC</addtitle><description>We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored.</description><subject>Aerospace electronics</subject><subject>Bias optimality</subject><subject>direct-comparison-based optimization</subject><subject>Dynamic programming</subject><subject>Generators</subject><subject>Ito–Tanaka formula</subject><subject>local time</subject><subject>Markov processes</subject><subject>Optimization</subject><subject>relative optimization</subject><subject>semismooth functions</subject><subject>state comparability</subject><subject>theory</subject><subject>Transient analysis</subject><subject>Viscosity</subject><subject>viscosity solution</subject><issn>0018-9286</issn><issn>1558-2523</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1rAjEQhkNpodb2Xuglf2BtPnbzcZSl1oJUEKXHEDezNkU3kkSL_74rSk_DMPO88D4IPVMyopTo1-W4HjFC5YiJqiqVvEEDWlWqYBXjt2hACFWFZkrco4eUfvpVlCUdIDPfZ7-zW59PuA6d89mHLuE2RDwL3aZYHDo8PkK0G8AL-LXRJfzl8zdedQ5igi002R_PtO0c_uzZXQj9eQI2HyKkR3TX2m2Cp-scotXkbVlPi9n8_aMez4qGCZ4LzrkjWlLbUOVKydaNBK20JpI0rm2t4mBLxaUQTrmKubVYKye0JYJJoXnLh4hccpsYUorQmn3se8WTocScBZlekDkLMldBPfJyQTwA_L9LxSQtJf8DRFJi5Q</recordid><startdate>201709</startdate><enddate>201709</enddate><creator>Cao, Xi-Ren</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>201709</creationdate><title>Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features</title><author>Cao, Xi-Ren</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c263t-333d0971ac18d472bc7e9899070cdffa83ea483766d8d52db6b8d69a0627693f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Aerospace electronics</topic><topic>Bias optimality</topic><topic>direct-comparison-based optimization</topic><topic>Dynamic programming</topic><topic>Generators</topic><topic>Ito–Tanaka formula</topic><topic>local time</topic><topic>Markov processes</topic><topic>Optimization</topic><topic>relative optimization</topic><topic>semismooth functions</topic><topic>state comparability</topic><topic>theory</topic><topic>Transient analysis</topic><topic>Viscosity</topic><topic>viscosity solution</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cao, Xi-Ren</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on automatic control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cao, Xi-Ren</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features</atitle><jtitle>IEEE transactions on automatic control</jtitle><stitle>TAC</stitle><date>2017-09</date><risdate>2017</risdate><volume>62</volume><issue>9</issue><spage>4318</spage><epage>4332</epage><pages>4318-4332</pages><issn>0018-9286</issn><eissn>1558-2523</eissn><coden>IETAA9</coden><abstract>We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored.</abstract><pub>IEEE</pub><doi>10.1109/TAC.2017.2655487</doi><tpages>15</tpages></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0018-9286
ispartof IEEE transactions on automatic control, 2017-09, Vol.62 (9), p.4318-4332
issn 0018-9286
1558-2523
language eng
recordid cdi_crossref_primary_10_1109_TAC_2017_2655487
source IEEE Electronic Library (IEL)
subjects Aerospace electronics
Bias optimality
direct-comparison-based optimization
Dynamic programming
Generators
Ito–Tanaka formula
local time
Markov processes
Optimization
relative optimization
semismooth functions
state comparability
theory
Transient analysis
Viscosity
viscosity solution
title Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T04%3A31%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimality%20Conditions%20for%20Long-Run%20Average%20Rewards%20With%20Underselectivity%20and%20Nonsmooth%20Features&rft.jtitle=IEEE%20transactions%20on%20automatic%20control&rft.au=Cao,%20Xi-Ren&rft.date=2017-09&rft.volume=62&rft.issue=9&rft.spage=4318&rft.epage=4332&rft.pages=4318-4332&rft.issn=0018-9286&rft.eissn=1558-2523&rft.coden=IETAA9&rft_id=info:doi/10.1109/TAC.2017.2655487&rft_dat=%3Ccrossref_RIE%3E10_1109_TAC_2017_2655487%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=7827147&rfr_iscdi=true