Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features

We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its relate...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on automatic control 2017-09, Vol.62 (9), p.4318-4332
1. Verfasser:	Cao, Xi-Ren
Format:	Artikel
Sprache:	eng
Schlagworte:	Aerospace electronics Bias optimality direct-comparison-based optimization Dynamic programming Generators Ito–Tanaka formula local time Markov processes Optimization relative optimization semismooth functions state comparability theory Transient analysis Viscosity viscosity solution
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	4332
container_issue	9
container_start_page	4318
container_title	IEEE transactions on automatic control
container_volume	62
creator	Cao, Xi-Ren
description	We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored.
doi_str_mv	10.1109/TAC.2017.2655487
format	Article
fullrecord	<record><control><sourceid>crossref_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TAC_2017_2655487</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>7827147</ieee_id><sourcerecordid>10_1109_TAC_2017_2655487</sourcerecordid><originalsourceid>FETCH-LOGICAL-c263t-333d0971ac18d472bc7e9899070cdffa83ea483766d8d52db6b8d69a0627693f3</originalsourceid><addsrcrecordid>eNo9kE1rAjEQhkNpodb2Xuglf2BtPnbzcZSl1oJUEKXHEDezNkU3kkSL_74rSk_DMPO88D4IPVMyopTo1-W4HjFC5YiJqiqVvEEDWlWqYBXjt2hACFWFZkrco4eUfvpVlCUdIDPfZ7-zW59PuA6d89mHLuE2RDwL3aZYHDo8PkK0G8AL-LXRJfzl8zdedQ5igi002R_PtO0c_uzZXQj9eQI2HyKkR3TX2m2Cp-scotXkbVlPi9n8_aMez4qGCZ4LzrkjWlLbUOVKydaNBK20JpI0rm2t4mBLxaUQTrmKubVYKye0JYJJoXnLh4hccpsYUorQmn3se8WTocScBZlekDkLMldBPfJyQTwA_L9LxSQtJf8DRFJi5Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features</title><source>IEEE Electronic Library (IEL)</source><creator>Cao, Xi-Ren</creator><creatorcontrib>Cao, Xi-Ren</creatorcontrib><description>We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored.</description><identifier>ISSN: 0018-9286</identifier><identifier>EISSN: 1558-2523</identifier><identifier>DOI: 10.1109/TAC.2017.2655487</identifier><identifier>CODEN: IETAA9</identifier><language>eng</language><publisher>IEEE</publisher><subject>Aerospace electronics ; Bias optimality ; direct-comparison-based optimization ; Dynamic programming ; Generators ; Ito–Tanaka formula ; local time ; Markov processes ; Optimization ; relative optimization ; semismooth functions ; state comparability ; theory ; Transient analysis ; Viscosity ; viscosity solution</subject><ispartof>IEEE transactions on automatic control, 2017-09, Vol.62 (9), p.4318-4332</ispartof><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c263t-333d0971ac18d472bc7e9899070cdffa83ea483766d8d52db6b8d69a0627693f3</citedby><cites>FETCH-LOGICAL-c263t-333d0971ac18d472bc7e9899070cdffa83ea483766d8d52db6b8d69a0627693f3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/7827147$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27928,27929,54762</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/7827147$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Cao, Xi-Ren</creatorcontrib><title>Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features</title><title>IEEE transactions on automatic control</title><addtitle>TAC</addtitle><description>We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored.</description><subject>Aerospace electronics</subject><subject>Bias optimality</subject><subject>direct-comparison-based optimization</subject><subject>Dynamic programming</subject><subject>Generators</subject><subject>Ito–Tanaka formula</subject><subject>local time</subject><subject>Markov processes</subject><subject>Optimization</subject><subject>relative optimization</subject><subject>semismooth functions</subject><subject>state comparability</subject><subject>theory</subject><subject>Transient analysis</subject><subject>Viscosity</subject><subject>viscosity solution</subject><issn>0018-9286</issn><issn>1558-2523</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2017</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1rAjEQhkNpodb2Xuglf2BtPnbzcZSl1oJUEKXHEDezNkU3kkSL_74rSk_DMPO88D4IPVMyopTo1-W4HjFC5YiJqiqVvEEDWlWqYBXjt2hACFWFZkrco4eUfvpVlCUdIDPfZ7-zW59PuA6d89mHLuE2RDwL3aZYHDo8PkK0G8AL-LXRJfzl8zdedQ5igi002R_PtO0c_uzZXQj9eQI2HyKkR3TX2m2Cp-scotXkbVlPi9n8_aMez4qGCZ4LzrkjWlLbUOVKydaNBK20JpI0rm2t4mBLxaUQTrmKubVYKye0JYJJoXnLh4hccpsYUorQmn3se8WTocScBZlekDkLMldBPfJyQTwA_L9LxSQtJf8DRFJi5Q</recordid><startdate>201709</startdate><enddate>201709</enddate><creator>Cao, Xi-Ren</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope></search><sort><creationdate>201709</creationdate><title>Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features</title><author>Cao, Xi-Ren</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c263t-333d0971ac18d472bc7e9899070cdffa83ea483766d8d52db6b8d69a0627693f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2017</creationdate><topic>Aerospace electronics</topic><topic>Bias optimality</topic><topic>direct-comparison-based optimization</topic><topic>Dynamic programming</topic><topic>Generators</topic><topic>Ito–Tanaka formula</topic><topic>local time</topic><topic>Markov processes</topic><topic>Optimization</topic><topic>relative optimization</topic><topic>semismooth functions</topic><topic>state comparability</topic><topic>theory</topic><topic>Transient analysis</topic><topic>Viscosity</topic><topic>viscosity solution</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cao, Xi-Ren</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><jtitle>IEEE transactions on automatic control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cao, Xi-Ren</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features</atitle><jtitle>IEEE transactions on automatic control</jtitle><stitle>TAC</stitle><date>2017-09</date><risdate>2017</risdate><volume>62</volume><issue>9</issue><spage>4318</spage><epage>4332</epage><pages>4318-4332</pages><issn>0018-9286</issn><eissn>1558-2523</eissn><coden>IETAA9</coden><abstract>We study three existing issues associated with optimization of long-run average rewards of time-nonhomogeneous Markov processes in continuous time with continuous state spaces: 1) the underselectivity, i.e., the optimal policies do not depend on their actions in any finite time period; 2) its related issue, the bias optimality, i.e., policies that optimize both long-run average and transient total rewards, and 3) the effects of a nonsmooth point of a value function on performance optimization. These issues require considerations of the performance in the entire period with an infinite horizon, and therefore are not easily solvable by dynamic programming, which works backwards in time and takes a local view at a particular time instant. In this paper, we take a different approach called the relative optimization theory, which is based on a direct comparison of the performance measures of any two policies. We derive tight necessary and sufficient optimality conditions that take the underselectivity into consideration; we derive bias optimality conditions for both long-run average and transient rewards; and we show that the effect of a wide class of nonsmooth points, called semismooth points, of a value function on the long-run average performance is zero and can be ignored.</abstract><pub>IEEE</pub><doi>10.1109/TAC.2017.2655487</doi><tpages>15</tpages></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0018-9286
ispartof	IEEE transactions on automatic control, 2017-09, Vol.62 (9), p.4318-4332
issn	0018-9286 1558-2523
language	eng
recordid	cdi_crossref_primary_10_1109_TAC_2017_2655487
source	IEEE Electronic Library (IEL)
subjects	Aerospace electronics Bias optimality direct-comparison-based optimization Dynamic programming Generators Ito–Tanaka formula local time Markov processes Optimization relative optimization semismooth functions state comparability theory Transient analysis Viscosity viscosity solution
title	Optimality Conditions for Long-Run Average Rewards With Underselectivity and Nonsmooth Features
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T04%3A31%3A10IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-crossref_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Optimality%20Conditions%20for%20Long-Run%20Average%20Rewards%20With%20Underselectivity%20and%20Nonsmooth%20Features&rft.jtitle=IEEE%20transactions%20on%20automatic%20control&rft.au=Cao,%20Xi-Ren&rft.date=2017-09&rft.volume=62&rft.issue=9&rft.spage=4318&rft.epage=4332&rft.pages=4318-4332&rft.issn=0018-9286&rft.eissn=1558-2523&rft.coden=IETAA9&rft_id=info:doi/10.1109/TAC.2017.2655487&rft_dat=%3Ccrossref_RIE%3E10_1109_TAC_2017_2655487%3C/crossref_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rft_ieee_id=7827147&rfr_iscdi=true