Technical note: A computationally efficient algorithm for undiscounted Markov decision processes with restricted observations

We present a computationally efficient procedure to determine control policies for an infinite horizon Markov Decision process with restricted observations. The optimal policy for the system with restricted observations is a function of the observation process and not the unobservable states of the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Naval research logistics 2009-02, Vol.56 (1), p.86-92
Hauptverfasser:	Davis, Lauren B., Hodgson, Thom J., King, Russell E., Wei, Wenbin
Format:	Artikel
Sprache:	eng
Schlagworte:	heuristics Markov Decision process optimal control
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	92
container_issue	1
container_start_page	86
container_title	Naval research logistics
container_volume	56
creator	Davis, Lauren B. Hodgson, Thom J. King, Russell E. Wei, Wenbin
description	We present a computationally efficient procedure to determine control policies for an infinite horizon Markov Decision process with restricted observations. The optimal policy for the system with restricted observations is a function of the observation process and not the unobservable states of the system. Thus, the policy is stationary with respect to the partitioned state space. The algorithm we propose addresses the undiscounted average cost case. The algorithm combines a local search with a modified version of Howard's (Dynamic programming and Markov processes, MIT Press, Cambridge, MA, 1960) policy iteration method. We demonstrate empirically that the algorithm finds the optimal deterministic policy for over 96% of the problem instances generated. For large scale problem instances, we demonstrate that the average cost associated with the local optimal policy is lower than the average cost associated with an integer rounded policy produced by the algorithm of Serin and Kulkarni Math Methods Oper Res 61 (2005) 311–328. © 2008 Wiley Periodicals, Inc. Naval Research Logistics 2009
doi_str_mv	10.1002/nav.20329
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_miscellaneous_32983188</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>32983188</sourcerecordid><originalsourceid>FETCH-LOGICAL-c3929-8b5f8bc068b9a48c25a7f3d3868d4e81e3f57366755d90a94c5f6a1f1f3d39d3</originalsourceid><addsrcrecordid>eNp1kL1OwzAURi0EEuVn4A08ITEEnDhObLYKUUAqZYkAsViuc00NaVzspNCBd8elwMbk5Zyrzweho5ScpoRkZ61anmaEZmILDVKWkaQoGdlGA8JFnpBCPO6ivRBeCCFFTtgAfVagZ63VqsGt6-AcD7F280Xfqc66VjXNCoMxVltoO6yaZ-dtN5tj4zzu29oG7fq2gxrfKv_qlrgGbUMU8cI7DSFAwO9RwB5C561ek24awC-_z4cDtGNUE-Dw591H1eiyurhOxndXNxfDcaKpyETCp8zwqSYFnwqVc50xVRpaU17wOgeeAjWspEX8KqsFUSLXzBQqNekaEjXdR8ebs3HVWx-nyHlcDk2jWnB9kDEXpynnETzZgNq7EDwYufB2rvxKpkSu-8rYV373jezZhn23Daz-B-VkeP9rJBvDhg4-_oxYThYlLZl8mFzJMSO39GlUyXv6BUbAj3Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>32983188</pqid></control><display><type>article</type><title>Technical note: A computationally efficient algorithm for undiscounted Markov decision processes with restricted observations</title><source>Wiley Online Library</source><creator>Davis, Lauren B. ; Hodgson, Thom J. ; King, Russell E. ; Wei, Wenbin</creator><creatorcontrib>Davis, Lauren B. ; Hodgson, Thom J. ; King, Russell E. ; Wei, Wenbin</creatorcontrib><description>We present a computationally efficient procedure to determine control policies for an infinite horizon Markov Decision process with restricted observations. The optimal policy for the system with restricted observations is a function of the observation process and not the unobservable states of the system. Thus, the policy is stationary with respect to the partitioned state space. The algorithm we propose addresses the undiscounted average cost case. The algorithm combines a local search with a modified version of Howard's (Dynamic programming and Markov processes, MIT Press, Cambridge, MA, 1960) policy iteration method. We demonstrate empirically that the algorithm finds the optimal deterministic policy for over 96% of the problem instances generated. For large scale problem instances, we demonstrate that the average cost associated with the local optimal policy is lower than the average cost associated with an integer rounded policy produced by the algorithm of Serin and Kulkarni Math Methods Oper Res 61 (2005) 311–328. © 2008 Wiley Periodicals, Inc. Naval Research Logistics 2009</description><identifier>ISSN: 0894-069X</identifier><identifier>EISSN: 1520-6750</identifier><identifier>DOI: 10.1002/nav.20329</identifier><language>eng</language><publisher>Hoboken: Wiley Subscription Services, Inc., A Wiley Company</publisher><subject>heuristics ; Markov Decision process ; optimal control</subject><ispartof>Naval research logistics, 2009-02, Vol.56 (1), p.86-92</ispartof><rights>Copyright © 2008 Wiley Periodicals, Inc.</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c3929-8b5f8bc068b9a48c25a7f3d3868d4e81e3f57366755d90a94c5f6a1f1f3d39d3</citedby><cites>FETCH-LOGICAL-c3929-8b5f8bc068b9a48c25a7f3d3868d4e81e3f57366755d90a94c5f6a1f1f3d39d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://onlinelibrary.wiley.com/doi/pdf/10.1002%2Fnav.20329$$EPDF$$P50$$Gwiley$$H</linktopdf><linktohtml>$$Uhttps://onlinelibrary.wiley.com/doi/full/10.1002%2Fnav.20329$$EHTML$$P50$$Gwiley$$H</linktohtml><link.rule.ids>314,780,784,1417,27924,27925,45574,45575</link.rule.ids></links><search><creatorcontrib>Davis, Lauren B.</creatorcontrib><creatorcontrib>Hodgson, Thom J.</creatorcontrib><creatorcontrib>King, Russell E.</creatorcontrib><creatorcontrib>Wei, Wenbin</creatorcontrib><title>Technical note: A computationally efficient algorithm for undiscounted Markov decision processes with restricted observations</title><title>Naval research logistics</title><addtitle>Naval Research Logistics</addtitle><description>We present a computationally efficient procedure to determine control policies for an infinite horizon Markov Decision process with restricted observations. The optimal policy for the system with restricted observations is a function of the observation process and not the unobservable states of the system. Thus, the policy is stationary with respect to the partitioned state space. The algorithm we propose addresses the undiscounted average cost case. The algorithm combines a local search with a modified version of Howard's (Dynamic programming and Markov processes, MIT Press, Cambridge, MA, 1960) policy iteration method. We demonstrate empirically that the algorithm finds the optimal deterministic policy for over 96% of the problem instances generated. For large scale problem instances, we demonstrate that the average cost associated with the local optimal policy is lower than the average cost associated with an integer rounded policy produced by the algorithm of Serin and Kulkarni Math Methods Oper Res 61 (2005) 311–328. © 2008 Wiley Periodicals, Inc. Naval Research Logistics 2009</description><subject>heuristics</subject><subject>Markov Decision process</subject><subject>optimal control</subject><issn>0894-069X</issn><issn>1520-6750</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2009</creationdate><recordtype>article</recordtype><recordid>eNp1kL1OwzAURi0EEuVn4A08ITEEnDhObLYKUUAqZYkAsViuc00NaVzspNCBd8elwMbk5Zyrzweho5ScpoRkZ61anmaEZmILDVKWkaQoGdlGA8JFnpBCPO6ivRBeCCFFTtgAfVagZ63VqsGt6-AcD7F280Xfqc66VjXNCoMxVltoO6yaZ-dtN5tj4zzu29oG7fq2gxrfKv_qlrgGbUMU8cI7DSFAwO9RwB5C561ek24awC-_z4cDtGNUE-Dw591H1eiyurhOxndXNxfDcaKpyETCp8zwqSYFnwqVc50xVRpaU17wOgeeAjWspEX8KqsFUSLXzBQqNekaEjXdR8ebs3HVWx-nyHlcDk2jWnB9kDEXpynnETzZgNq7EDwYufB2rvxKpkSu-8rYV373jezZhn23Daz-B-VkeP9rJBvDhg4-_oxYThYlLZl8mFzJMSO39GlUyXv6BUbAj3Q</recordid><startdate>200902</startdate><enddate>200902</enddate><creator>Davis, Lauren B.</creator><creator>Hodgson, Thom J.</creator><creator>King, Russell E.</creator><creator>Wei, Wenbin</creator><general>Wiley Subscription Services, Inc., A Wiley Company</general><scope>BSCLL</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>KR7</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>200902</creationdate><title>Technical note: A computationally efficient algorithm for undiscounted Markov decision processes with restricted observations</title><author>Davis, Lauren B. ; Hodgson, Thom J. ; King, Russell E. ; Wei, Wenbin</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c3929-8b5f8bc068b9a48c25a7f3d3868d4e81e3f57366755d90a94c5f6a1f1f3d39d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2009</creationdate><topic>heuristics</topic><topic>Markov Decision process</topic><topic>optimal control</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Davis, Lauren B.</creatorcontrib><creatorcontrib>Hodgson, Thom J.</creatorcontrib><creatorcontrib>King, Russell E.</creatorcontrib><creatorcontrib>Wei, Wenbin</creatorcontrib><collection>Istex</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Civil Engineering Abstracts</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Naval research logistics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Davis, Lauren B.</au><au>Hodgson, Thom J.</au><au>King, Russell E.</au><au>Wei, Wenbin</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Technical note: A computationally efficient algorithm for undiscounted Markov decision processes with restricted observations</atitle><jtitle>Naval research logistics</jtitle><addtitle>Naval Research Logistics</addtitle><date>2009-02</date><risdate>2009</risdate><volume>56</volume><issue>1</issue><spage>86</spage><epage>92</epage><pages>86-92</pages><issn>0894-069X</issn><eissn>1520-6750</eissn><abstract>We present a computationally efficient procedure to determine control policies for an infinite horizon Markov Decision process with restricted observations. The optimal policy for the system with restricted observations is a function of the observation process and not the unobservable states of the system. Thus, the policy is stationary with respect to the partitioned state space. The algorithm we propose addresses the undiscounted average cost case. The algorithm combines a local search with a modified version of Howard's (Dynamic programming and Markov processes, MIT Press, Cambridge, MA, 1960) policy iteration method. We demonstrate empirically that the algorithm finds the optimal deterministic policy for over 96% of the problem instances generated. For large scale problem instances, we demonstrate that the average cost associated with the local optimal policy is lower than the average cost associated with an integer rounded policy produced by the algorithm of Serin and Kulkarni Math Methods Oper Res 61 (2005) 311–328. © 2008 Wiley Periodicals, Inc. Naval Research Logistics 2009</abstract><cop>Hoboken</cop><pub>Wiley Subscription Services, Inc., A Wiley Company</pub><doi>10.1002/nav.20329</doi><tpages>7</tpages></addata></record>
fulltext	fulltext
identifier	ISSN: 0894-069X
ispartof	Naval research logistics, 2009-02, Vol.56 (1), p.86-92
issn	0894-069X 1520-6750
language	eng
recordid	cdi_proquest_miscellaneous_32983188
source	Wiley Online Library
subjects	heuristics Markov Decision process optimal control
title	Technical note: A computationally efficient algorithm for undiscounted Markov decision processes with restricted observations
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T05%3A59%3A48IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Technical%20note:%20A%20computationally%20efficient%20algorithm%20for%20undiscounted%20Markov%20decision%20processes%20with%20restricted%20observations&rft.jtitle=Naval%20research%20logistics&rft.au=Davis,%20Lauren%20B.&rft.date=2009-02&rft.volume=56&rft.issue=1&rft.spage=86&rft.epage=92&rft.pages=86-92&rft.issn=0894-069X&rft.eissn=1520-6750&rft_id=info:doi/10.1002/nav.20329&rft_dat=%3Cproquest_cross%3E32983188%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=32983188&rft_id=info:pmid/&rfr_iscdi=true