Controlled Markov Processes With Safety State Constraints

This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states throug...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on automatic control 2019-03, Vol.64 (3), p.1003-1018
Hauptverfasser: Chamie, Mahmoud El, Yu, Yue, Acikmese, Behcet, Ono, Masahiro
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 1018
container_issue 3
container_start_page 1003
container_title IEEE transactions on automatic control
container_volume 64
creator Chamie, Mahmoud El
Yu, Yue
Acikmese, Behcet
Ono, Masahiro
description This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.
doi_str_mv 10.1109/TAC.2018.2849556
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TAC_2018_2849556</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8391697</ieee_id><sourcerecordid>2187965149</sourcerecordid><originalsourceid>FETCH-LOGICAL-c333t-83ef0cd49349638472d8d61ea6aaf277f17f37a0a4b53fd031800072ae29e4783</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKt3wcuC513z_XEsi1ahotCKxxB3J7i1NjVJhf57U1o8DcM87zvwIHRNcEMINneLSdtQTHRDNTdCyBM0IkLomgrKTtEIl1NtqJbn6CKlZVkl52SETBvWOYbVCvrq2cWv8Fu9xtBBSpCq9yF_VnPnIe-qeXYZqkKnHN2wzukSnXm3SnB1nGP09nC_aB_r2cv0qZ3M6o4xlmvNwOOu54ZxI5nmiva6lwScdM5TpTxRnimHHf8QzPeYEY0xVtQBNcCVZmN0e-jdxPCzhZTtMmzjury0lGhlpCClfIzwgepiSCmCt5s4fLu4swTbvSBbBNm9IHsUVCI3h8gAAP-4ZoZIo9gfVwhf_g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2187965149</pqid></control><display><type>article</type><title>Controlled Markov Processes With Safety State Constraints</title><source>IEEE Xplore</source><creator>Chamie, Mahmoud El ; Yu, Yue ; Acikmese, Behcet ; Ono, Masahiro</creator><creatorcontrib>Chamie, Mahmoud El ; Yu, Yue ; Acikmese, Behcet ; Ono, Masahiro</creatorcontrib><description>This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.</description><identifier>ISSN: 0018-9286</identifier><identifier>EISSN: 1558-2523</identifier><identifier>DOI: 10.1109/TAC.2018.2849556</identifier><identifier>CODEN: IETAA9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>agents and autonomous systems ; Algorithms ; Computational geometry ; Computer simulation ; constrained control ; Constraint modelling ; Control simulation ; controlled Markov chains ; Convexity ; Decision making ; Decision theory ; Dynamic programming ; Linear programming ; Lower bounds ; Markov chains ; Markov decision processes ; Markov processes ; Optimization ; Policies ; Probability distribution ; Safety ; stochastic optimal control</subject><ispartof>IEEE transactions on automatic control, 2019-03, Vol.64 (3), p.1003-1018</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c333t-83ef0cd49349638472d8d61ea6aaf277f17f37a0a4b53fd031800072ae29e4783</citedby><cites>FETCH-LOGICAL-c333t-83ef0cd49349638472d8d61ea6aaf277f17f37a0a4b53fd031800072ae29e4783</cites><orcidid>0000-0002-8693-8109 ; 0000-0002-8309-1838 ; 0000-0003-1317-6710</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8391697$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27926,27927,54760</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8391697$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chamie, Mahmoud El</creatorcontrib><creatorcontrib>Yu, Yue</creatorcontrib><creatorcontrib>Acikmese, Behcet</creatorcontrib><creatorcontrib>Ono, Masahiro</creatorcontrib><title>Controlled Markov Processes With Safety State Constraints</title><title>IEEE transactions on automatic control</title><addtitle>TAC</addtitle><description>This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.</description><subject>agents and autonomous systems</subject><subject>Algorithms</subject><subject>Computational geometry</subject><subject>Computer simulation</subject><subject>constrained control</subject><subject>Constraint modelling</subject><subject>Control simulation</subject><subject>controlled Markov chains</subject><subject>Convexity</subject><subject>Decision making</subject><subject>Decision theory</subject><subject>Dynamic programming</subject><subject>Linear programming</subject><subject>Lower bounds</subject><subject>Markov chains</subject><subject>Markov decision processes</subject><subject>Markov processes</subject><subject>Optimization</subject><subject>Policies</subject><subject>Probability distribution</subject><subject>Safety</subject><subject>stochastic optimal control</subject><issn>0018-9286</issn><issn>1558-2523</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKt3wcuC513z_XEsi1ahotCKxxB3J7i1NjVJhf57U1o8DcM87zvwIHRNcEMINneLSdtQTHRDNTdCyBM0IkLomgrKTtEIl1NtqJbn6CKlZVkl52SETBvWOYbVCvrq2cWv8Fu9xtBBSpCq9yF_VnPnIe-qeXYZqkKnHN2wzukSnXm3SnB1nGP09nC_aB_r2cv0qZ3M6o4xlmvNwOOu54ZxI5nmiva6lwScdM5TpTxRnimHHf8QzPeYEY0xVtQBNcCVZmN0e-jdxPCzhZTtMmzjury0lGhlpCClfIzwgepiSCmCt5s4fLu4swTbvSBbBNm9IHsUVCI3h8gAAP-4ZoZIo9gfVwhf_g</recordid><startdate>20190301</startdate><enddate>20190301</enddate><creator>Chamie, Mahmoud El</creator><creator>Yu, Yue</creator><creator>Acikmese, Behcet</creator><creator>Ono, Masahiro</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-8693-8109</orcidid><orcidid>https://orcid.org/0000-0002-8309-1838</orcidid><orcidid>https://orcid.org/0000-0003-1317-6710</orcidid></search><sort><creationdate>20190301</creationdate><title>Controlled Markov Processes With Safety State Constraints</title><author>Chamie, Mahmoud El ; Yu, Yue ; Acikmese, Behcet ; Ono, Masahiro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c333t-83ef0cd49349638472d8d61ea6aaf277f17f37a0a4b53fd031800072ae29e4783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>agents and autonomous systems</topic><topic>Algorithms</topic><topic>Computational geometry</topic><topic>Computer simulation</topic><topic>constrained control</topic><topic>Constraint modelling</topic><topic>Control simulation</topic><topic>controlled Markov chains</topic><topic>Convexity</topic><topic>Decision making</topic><topic>Decision theory</topic><topic>Dynamic programming</topic><topic>Linear programming</topic><topic>Lower bounds</topic><topic>Markov chains</topic><topic>Markov decision processes</topic><topic>Markov processes</topic><topic>Optimization</topic><topic>Policies</topic><topic>Probability distribution</topic><topic>Safety</topic><topic>stochastic optimal control</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chamie, Mahmoud El</creatorcontrib><creatorcontrib>Yu, Yue</creatorcontrib><creatorcontrib>Acikmese, Behcet</creatorcontrib><creatorcontrib>Ono, Masahiro</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on automatic control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chamie, Mahmoud El</au><au>Yu, Yue</au><au>Acikmese, Behcet</au><au>Ono, Masahiro</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Controlled Markov Processes With Safety State Constraints</atitle><jtitle>IEEE transactions on automatic control</jtitle><stitle>TAC</stitle><date>2019-03-01</date><risdate>2019</risdate><volume>64</volume><issue>3</issue><spage>1003</spage><epage>1018</epage><pages>1003-1018</pages><issn>0018-9286</issn><eissn>1558-2523</eissn><coden>IETAA9</coden><abstract>This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TAC.2018.2849556</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-8693-8109</orcidid><orcidid>https://orcid.org/0000-0002-8309-1838</orcidid><orcidid>https://orcid.org/0000-0003-1317-6710</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 0018-9286
ispartof IEEE transactions on automatic control, 2019-03, Vol.64 (3), p.1003-1018
issn 0018-9286
1558-2523
language eng
recordid cdi_crossref_primary_10_1109_TAC_2018_2849556
source IEEE Xplore
subjects agents and autonomous systems
Algorithms
Computational geometry
Computer simulation
constrained control
Constraint modelling
Control simulation
controlled Markov chains
Convexity
Decision making
Decision theory
Dynamic programming
Linear programming
Lower bounds
Markov chains
Markov decision processes
Markov processes
Optimization
Policies
Probability distribution
Safety
stochastic optimal control
title Controlled Markov Processes With Safety State Constraints
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T19%3A44%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Controlled%20Markov%20Processes%20With%20Safety%20State%20Constraints&rft.jtitle=IEEE%20transactions%20on%20automatic%20control&rft.au=Chamie,%20Mahmoud%20El&rft.date=2019-03-01&rft.volume=64&rft.issue=3&rft.spage=1003&rft.epage=1018&rft.pages=1003-1018&rft.issn=0018-9286&rft.eissn=1558-2523&rft.coden=IETAA9&rft_id=info:doi/10.1109/TAC.2018.2849556&rft_dat=%3Cproquest_RIE%3E2187965149%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2187965149&rft_id=info:pmid/&rft_ieee_id=8391697&rfr_iscdi=true