Controlled Markov Processes With Safety State Constraints

This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states throug...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on automatic control 2019-03, Vol.64 (3), p.1003-1018
Hauptverfasser:	Chamie, Mahmoud El, Yu, Yue, Acikmese, Behcet, Ono, Masahiro
Format:	Artikel
Sprache:	eng
Schlagworte:	agents and autonomous systems Algorithms Computational geometry Computer simulation constrained control Constraint modelling Control simulation controlled Markov chains Convexity Decision making Decision theory Dynamic programming Linear programming Lower bounds Markov chains Markov decision processes Markov processes Optimization Policies Probability distribution Safety stochastic optimal control
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	1018
container_issue	3
container_start_page	1003
container_title	IEEE transactions on automatic control
container_volume	64
creator	Chamie, Mahmoud El Yu, Yue Acikmese, Behcet Ono, Masahiro
description	This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.
doi_str_mv	10.1109/TAC.2018.2849556
format	Article
fullrecord	<record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TAC_2018_2849556</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8391697</ieee_id><sourcerecordid>2187965149</sourcerecordid><originalsourceid>FETCH-LOGICAL-c333t-83ef0cd49349638472d8d61ea6aaf277f17f37a0a4b53fd031800072ae29e4783</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKt3wcuC513z_XEsi1ahotCKxxB3J7i1NjVJhf57U1o8DcM87zvwIHRNcEMINneLSdtQTHRDNTdCyBM0IkLomgrKTtEIl1NtqJbn6CKlZVkl52SETBvWOYbVCvrq2cWv8Fu9xtBBSpCq9yF_VnPnIe-qeXYZqkKnHN2wzukSnXm3SnB1nGP09nC_aB_r2cv0qZ3M6o4xlmvNwOOu54ZxI5nmiva6lwScdM5TpTxRnimHHf8QzPeYEY0xVtQBNcCVZmN0e-jdxPCzhZTtMmzjury0lGhlpCClfIzwgepiSCmCt5s4fLu4swTbvSBbBNm9IHsUVCI3h8gAAP-4ZoZIo9gfVwhf_g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2187965149</pqid></control><display><type>article</type><title>Controlled Markov Processes With Safety State Constraints</title><source>IEEE Xplore</source><creator>Chamie, Mahmoud El ; Yu, Yue ; Acikmese, Behcet ; Ono, Masahiro</creator><creatorcontrib>Chamie, Mahmoud El ; Yu, Yue ; Acikmese, Behcet ; Ono, Masahiro</creatorcontrib><description>This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.</description><identifier>ISSN: 0018-9286</identifier><identifier>EISSN: 1558-2523</identifier><identifier>DOI: 10.1109/TAC.2018.2849556</identifier><identifier>CODEN: IETAA9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>agents and autonomous systems ; Algorithms ; Computational geometry ; Computer simulation ; constrained control ; Constraint modelling ; Control simulation ; controlled Markov chains ; Convexity ; Decision making ; Decision theory ; Dynamic programming ; Linear programming ; Lower bounds ; Markov chains ; Markov decision processes ; Markov processes ; Optimization ; Policies ; Probability distribution ; Safety ; stochastic optimal control</subject><ispartof>IEEE transactions on automatic control, 2019-03, Vol.64 (3), p.1003-1018</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c333t-83ef0cd49349638472d8d61ea6aaf277f17f37a0a4b53fd031800072ae29e4783</citedby><cites>FETCH-LOGICAL-c333t-83ef0cd49349638472d8d61ea6aaf277f17f37a0a4b53fd031800072ae29e4783</cites><orcidid>0000-0002-8693-8109 ; 0000-0002-8309-1838 ; 0000-0003-1317-6710</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8391697$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27926,27927,54760</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8391697$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chamie, Mahmoud El</creatorcontrib><creatorcontrib>Yu, Yue</creatorcontrib><creatorcontrib>Acikmese, Behcet</creatorcontrib><creatorcontrib>Ono, Masahiro</creatorcontrib><title>Controlled Markov Processes With Safety State Constraints</title><title>IEEE transactions on automatic control</title><addtitle>TAC</addtitle><description>This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.</description><subject>agents and autonomous systems</subject><subject>Algorithms</subject><subject>Computational geometry</subject><subject>Computer simulation</subject><subject>constrained control</subject><subject>Constraint modelling</subject><subject>Control simulation</subject><subject>controlled Markov chains</subject><subject>Convexity</subject><subject>Decision making</subject><subject>Decision theory</subject><subject>Dynamic programming</subject><subject>Linear programming</subject><subject>Lower bounds</subject><subject>Markov chains</subject><subject>Markov decision processes</subject><subject>Markov processes</subject><subject>Optimization</subject><subject>Policies</subject><subject>Probability distribution</subject><subject>Safety</subject><subject>stochastic optimal control</subject><issn>0018-9286</issn><issn>1558-2523</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKt3wcuC513z_XEsi1ahotCKxxB3J7i1NjVJhf57U1o8DcM87zvwIHRNcEMINneLSdtQTHRDNTdCyBM0IkLomgrKTtEIl1NtqJbn6CKlZVkl52SETBvWOYbVCvrq2cWv8Fu9xtBBSpCq9yF_VnPnIe-qeXYZqkKnHN2wzukSnXm3SnB1nGP09nC_aB_r2cv0qZ3M6o4xlmvNwOOu54ZxI5nmiva6lwScdM5TpTxRnimHHf8QzPeYEY0xVtQBNcCVZmN0e-jdxPCzhZTtMmzjury0lGhlpCClfIzwgepiSCmCt5s4fLu4swTbvSBbBNm9IHsUVCI3h8gAAP-4ZoZIo9gfVwhf_g</recordid><startdate>20190301</startdate><enddate>20190301</enddate><creator>Chamie, Mahmoud El</creator><creator>Yu, Yue</creator><creator>Acikmese, Behcet</creator><creator>Ono, Masahiro</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-8693-8109</orcidid><orcidid>https://orcid.org/0000-0002-8309-1838</orcidid><orcidid>https://orcid.org/0000-0003-1317-6710</orcidid></search><sort><creationdate>20190301</creationdate><title>Controlled Markov Processes With Safety State Constraints</title><author>Chamie, Mahmoud El ; Yu, Yue ; Acikmese, Behcet ; Ono, Masahiro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c333t-83ef0cd49349638472d8d61ea6aaf277f17f37a0a4b53fd031800072ae29e4783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>agents and autonomous systems</topic><topic>Algorithms</topic><topic>Computational geometry</topic><topic>Computer simulation</topic><topic>constrained control</topic><topic>Constraint modelling</topic><topic>Control simulation</topic><topic>controlled Markov chains</topic><topic>Convexity</topic><topic>Decision making</topic><topic>Decision theory</topic><topic>Dynamic programming</topic><topic>Linear programming</topic><topic>Lower bounds</topic><topic>Markov chains</topic><topic>Markov decision processes</topic><topic>Markov processes</topic><topic>Optimization</topic><topic>Policies</topic><topic>Probability distribution</topic><topic>Safety</topic><topic>stochastic optimal control</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chamie, Mahmoud El</creatorcontrib><creatorcontrib>Yu, Yue</creatorcontrib><creatorcontrib>Acikmese, Behcet</creatorcontrib><creatorcontrib>Ono, Masahiro</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on automatic control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chamie, Mahmoud El</au><au>Yu, Yue</au><au>Acikmese, Behcet</au><au>Ono, Masahiro</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Controlled Markov Processes With Safety State Constraints</atitle><jtitle>IEEE transactions on automatic control</jtitle><stitle>TAC</stitle><date>2019-03-01</date><risdate>2019</risdate><volume>64</volume><issue>3</issue><spage>1003</spage><epage>1018</epage><pages>1003-1018</pages><issn>0018-9286</issn><eissn>1558-2523</eissn><coden>IETAA9</coden><abstract>This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TAC.2018.2849556</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-8693-8109</orcidid><orcidid>https://orcid.org/0000-0002-8309-1838</orcidid><orcidid>https://orcid.org/0000-0003-1317-6710</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	ISSN: 0018-9286
ispartof	IEEE transactions on automatic control, 2019-03, Vol.64 (3), p.1003-1018
issn	0018-9286 1558-2523
language	eng
recordid	cdi_crossref_primary_10_1109_TAC_2018_2849556
source	IEEE Xplore
subjects	agents and autonomous systems Algorithms Computational geometry Computer simulation constrained control Constraint modelling Control simulation controlled Markov chains Convexity Decision making Decision theory Dynamic programming Linear programming Lower bounds Markov chains Markov decision processes Markov processes Optimization Policies Probability distribution Safety stochastic optimal control
title	Controlled Markov Processes With Safety State Constraints
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T19%3A44%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Controlled%20Markov%20Processes%20With%20Safety%20State%20Constraints&rft.jtitle=IEEE%20transactions%20on%20automatic%20control&rft.au=Chamie,%20Mahmoud%20El&rft.date=2019-03-01&rft.volume=64&rft.issue=3&rft.spage=1003&rft.epage=1018&rft.pages=1003-1018&rft.issn=0018-9286&rft.eissn=1558-2523&rft.coden=IETAA9&rft_id=info:doi/10.1109/TAC.2018.2849556&rft_dat=%3Cproquest_RIE%3E2187965149%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2187965149&rft_id=info:pmid/&rft_ieee_id=8391697&rfr_iscdi=true