Controlled Markov Processes With Safety State Constraints
This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states throug...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on automatic control 2019-03, Vol.64 (3), p.1003-1018 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 1018 |
---|---|
container_issue | 3 |
container_start_page | 1003 |
container_title | IEEE transactions on automatic control |
container_volume | 64 |
creator | Chamie, Mahmoud El Yu, Yue Acikmese, Behcet Ono, Masahiro |
description | This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms. |
doi_str_mv | 10.1109/TAC.2018.2849556 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TAC_2018_2849556</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8391697</ieee_id><sourcerecordid>2187965149</sourcerecordid><originalsourceid>FETCH-LOGICAL-c333t-83ef0cd49349638472d8d61ea6aaf277f17f37a0a4b53fd031800072ae29e4783</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKt3wcuC513z_XEsi1ahotCKxxB3J7i1NjVJhf57U1o8DcM87zvwIHRNcEMINneLSdtQTHRDNTdCyBM0IkLomgrKTtEIl1NtqJbn6CKlZVkl52SETBvWOYbVCvrq2cWv8Fu9xtBBSpCq9yF_VnPnIe-qeXYZqkKnHN2wzukSnXm3SnB1nGP09nC_aB_r2cv0qZ3M6o4xlmvNwOOu54ZxI5nmiva6lwScdM5TpTxRnimHHf8QzPeYEY0xVtQBNcCVZmN0e-jdxPCzhZTtMmzjury0lGhlpCClfIzwgepiSCmCt5s4fLu4swTbvSBbBNm9IHsUVCI3h8gAAP-4ZoZIo9gfVwhf_g</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2187965149</pqid></control><display><type>article</type><title>Controlled Markov Processes With Safety State Constraints</title><source>IEEE Xplore</source><creator>Chamie, Mahmoud El ; Yu, Yue ; Acikmese, Behcet ; Ono, Masahiro</creator><creatorcontrib>Chamie, Mahmoud El ; Yu, Yue ; Acikmese, Behcet ; Ono, Masahiro</creatorcontrib><description>This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.</description><identifier>ISSN: 0018-9286</identifier><identifier>EISSN: 1558-2523</identifier><identifier>DOI: 10.1109/TAC.2018.2849556</identifier><identifier>CODEN: IETAA9</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>agents and autonomous systems ; Algorithms ; Computational geometry ; Computer simulation ; constrained control ; Constraint modelling ; Control simulation ; controlled Markov chains ; Convexity ; Decision making ; Decision theory ; Dynamic programming ; Linear programming ; Lower bounds ; Markov chains ; Markov decision processes ; Markov processes ; Optimization ; Policies ; Probability distribution ; Safety ; stochastic optimal control</subject><ispartof>IEEE transactions on automatic control, 2019-03, Vol.64 (3), p.1003-1018</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2019</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c333t-83ef0cd49349638472d8d61ea6aaf277f17f37a0a4b53fd031800072ae29e4783</citedby><cites>FETCH-LOGICAL-c333t-83ef0cd49349638472d8d61ea6aaf277f17f37a0a4b53fd031800072ae29e4783</cites><orcidid>0000-0002-8693-8109 ; 0000-0002-8309-1838 ; 0000-0003-1317-6710</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8391697$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>315,781,785,797,27926,27927,54760</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/8391697$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Chamie, Mahmoud El</creatorcontrib><creatorcontrib>Yu, Yue</creatorcontrib><creatorcontrib>Acikmese, Behcet</creatorcontrib><creatorcontrib>Ono, Masahiro</creatorcontrib><title>Controlled Markov Processes With Safety State Constraints</title><title>IEEE transactions on automatic control</title><addtitle>TAC</addtitle><description>This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.</description><subject>agents and autonomous systems</subject><subject>Algorithms</subject><subject>Computational geometry</subject><subject>Computer simulation</subject><subject>constrained control</subject><subject>Constraint modelling</subject><subject>Control simulation</subject><subject>controlled Markov chains</subject><subject>Convexity</subject><subject>Decision making</subject><subject>Decision theory</subject><subject>Dynamic programming</subject><subject>Linear programming</subject><subject>Lower bounds</subject><subject>Markov chains</subject><subject>Markov decision processes</subject><subject>Markov processes</subject><subject>Optimization</subject><subject>Policies</subject><subject>Probability distribution</subject><subject>Safety</subject><subject>stochastic optimal control</subject><issn>0018-9286</issn><issn>1558-2523</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNo9kE1LAzEQhoMoWKt3wcuC513z_XEsi1ahotCKxxB3J7i1NjVJhf57U1o8DcM87zvwIHRNcEMINneLSdtQTHRDNTdCyBM0IkLomgrKTtEIl1NtqJbn6CKlZVkl52SETBvWOYbVCvrq2cWv8Fu9xtBBSpCq9yF_VnPnIe-qeXYZqkKnHN2wzukSnXm3SnB1nGP09nC_aB_r2cv0qZ3M6o4xlmvNwOOu54ZxI5nmiva6lwScdM5TpTxRnimHHf8QzPeYEY0xVtQBNcCVZmN0e-jdxPCzhZTtMmzjury0lGhlpCClfIzwgepiSCmCt5s4fLu4swTbvSBbBNm9IHsUVCI3h8gAAP-4ZoZIo9gfVwhf_g</recordid><startdate>20190301</startdate><enddate>20190301</enddate><creator>Chamie, Mahmoud El</creator><creator>Yu, Yue</creator><creator>Acikmese, Behcet</creator><creator>Ono, Masahiro</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0000-0002-8693-8109</orcidid><orcidid>https://orcid.org/0000-0002-8309-1838</orcidid><orcidid>https://orcid.org/0000-0003-1317-6710</orcidid></search><sort><creationdate>20190301</creationdate><title>Controlled Markov Processes With Safety State Constraints</title><author>Chamie, Mahmoud El ; Yu, Yue ; Acikmese, Behcet ; Ono, Masahiro</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c333t-83ef0cd49349638472d8d61ea6aaf277f17f37a0a4b53fd031800072ae29e4783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>agents and autonomous systems</topic><topic>Algorithms</topic><topic>Computational geometry</topic><topic>Computer simulation</topic><topic>constrained control</topic><topic>Constraint modelling</topic><topic>Control simulation</topic><topic>controlled Markov chains</topic><topic>Convexity</topic><topic>Decision making</topic><topic>Decision theory</topic><topic>Dynamic programming</topic><topic>Linear programming</topic><topic>Lower bounds</topic><topic>Markov chains</topic><topic>Markov decision processes</topic><topic>Markov processes</topic><topic>Optimization</topic><topic>Policies</topic><topic>Probability distribution</topic><topic>Safety</topic><topic>stochastic optimal control</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chamie, Mahmoud El</creatorcontrib><creatorcontrib>Yu, Yue</creatorcontrib><creatorcontrib>Acikmese, Behcet</creatorcontrib><creatorcontrib>Ono, Masahiro</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Xplore</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on automatic control</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chamie, Mahmoud El</au><au>Yu, Yue</au><au>Acikmese, Behcet</au><au>Ono, Masahiro</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Controlled Markov Processes With Safety State Constraints</atitle><jtitle>IEEE transactions on automatic control</jtitle><stitle>TAC</stitle><date>2019-03-01</date><risdate>2019</risdate><volume>64</volume><issue>3</issue><spage>1003</spage><epage>1018</epage><pages>1003-1018</pages><issn>0018-9286</issn><eissn>1558-2523</eissn><coden>IETAA9</coden><abstract>This paper considers a Markov decision process (MDP) model with safety state constraints, which specify polytopic invariance constraints on the state probability distribution (pd) for all time epochs. Typically, in the MDP framework, safety is addressed indirectly by penalizing failure states through the reward function. However, such an approach does not allow imposing hard constraints on the state pd, which could be an issue for practical applications where the chance of failure must be limited to prescribed bounds. In this paper, we explicitly separate state constraints from the reward function. We provide analysis and synthesis methods to impose generalized safety constraints at all time epochs, unlike current constrained MDP approaches where such constraints can only be imposed on the stationary distributions. We show that, contrary to the unconstrained MDP policies, optimal safe MDP policies depend on the initial state pd. We present novel algorithms for both finite- and infinite-horizon MDPs to synthesize feasible decision-making policies that satisfy safety constraints for all time epochs and ensure that the performance is above a computable lower bound. Linear programming implementations of the proposed algorithms are developed, which are formulated by using the duality theory of convex optimization. A swarm control simulation example is also provided to demonstrate the use of proposed algorithms.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TAC.2018.2849556</doi><tpages>16</tpages><orcidid>https://orcid.org/0000-0002-8693-8109</orcidid><orcidid>https://orcid.org/0000-0002-8309-1838</orcidid><orcidid>https://orcid.org/0000-0003-1317-6710</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 0018-9286 |
ispartof | IEEE transactions on automatic control, 2019-03, Vol.64 (3), p.1003-1018 |
issn | 0018-9286 1558-2523 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TAC_2018_2849556 |
source | IEEE Xplore |
subjects | agents and autonomous systems Algorithms Computational geometry Computer simulation constrained control Constraint modelling Control simulation controlled Markov chains Convexity Decision making Decision theory Dynamic programming Linear programming Lower bounds Markov chains Markov decision processes Markov processes Optimization Policies Probability distribution Safety stochastic optimal control |
title | Controlled Markov Processes With Safety State Constraints |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T19%3A44%3A36IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Controlled%20Markov%20Processes%20With%20Safety%20State%20Constraints&rft.jtitle=IEEE%20transactions%20on%20automatic%20control&rft.au=Chamie,%20Mahmoud%20El&rft.date=2019-03-01&rft.volume=64&rft.issue=3&rft.spage=1003&rft.epage=1018&rft.pages=1003-1018&rft.issn=0018-9286&rft.eissn=1558-2523&rft.coden=IETAA9&rft_id=info:doi/10.1109/TAC.2018.2849556&rft_dat=%3Cproquest_RIE%3E2187965149%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2187965149&rft_id=info:pmid/&rft_ieee_id=8391697&rfr_iscdi=true |