On an index policy for restless bandits

We investigate the optimal allocation of effort to a collection of n projects. The projects are ‘restless' in that the state of a project evolves in time, whether or not it is allocated effort. The evolution of the state of each project follows a Markov rule, but transitions and rewards depend...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Journal of applied probability 1990-09, Vol.27 (3), p.637-648
Hauptverfasser: Weber, Richard R., Weiss, Gideon
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 648
container_issue 3
container_start_page 637
container_title Journal of applied probability
container_volume 27
creator Weber, Richard R.
Weiss, Gideon
description We investigate the optimal allocation of effort to a collection of n projects. The projects are ‘restless' in that the state of a project evolves in time, whether or not it is allocated effort. The evolution of the state of each project follows a Markov rule, but transitions and rewards depend on whether or not the project receives effort. The objective is to maximize the expected time-average reward under a constraint that exactly m of the n projects receive effort at any one time. We show that as m and n tend to ∞ with m/n fixed, the per-project reward of the optimal policy is asymptotically the same as that achieved by a policy which operates under the relaxed constraint that an average of m projects be active. The relaxed constraint was considered by Whittle (1988) who described how to use a Lagrangian multiplier approach to assign indices to the projects. He conjectured that the policy of allocating effort to the m projects of greatest index is asymptotically optimal as m and n tend to∞. We show that the conjecture is true if the differential equation describing the fluid approximation to the index policy has a globally stable equilibrium point. This need not be the case, and we present an example for which the index policy is not asymptotically optimal. However, numerical work suggests that such counterexamples are extremely rare and that the size of the suboptimality which one might expect is minuscule.
doi_str_mv 10.2307/3214547
format Article
fullrecord <record><control><sourceid>jstor_proqu</sourceid><recordid>TN_cdi_proquest_miscellaneous_25782967</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><cupid>10_2307_3214547</cupid><jstor_id>3214547</jstor_id><sourcerecordid>3214547</sourcerecordid><originalsourceid>FETCH-LOGICAL-c309t-43109defbb680ec503f262ee4bca759fe5529397bc682159c4c0c089025f3b9d3</originalsourceid><addsrcrecordid>eNp90E1LxDAQBuAgCq6r-Bd6EBcP1clX0xxl8QsW9qLnkKSJpHSbNemC---tbNGD4GXm8vDO8CJ0ieGWUBB3lGDGmThCM8wELysQ5BjNAAgu5ThP0VnOLcCIpJihxbovdF-EvnGfxTZ2we4LH1ORXB46l3NhdN-EIZ-jE6-77C6mPUdvjw-vy-dytX56Wd6vSktBDiWjGGTjvDFVDc5yoJ5UxDlmrBZcesc5kVQKY6uaYC4ts2ChlkC4p0Y2dI6uD7nbFD924xNqE7J1Xad7F3dZES5qIisxwsUB2hRzTs6rbQobnfYKg_ouQk1FjPJqitTZ6s4n3duQfzhjRNQ1-2VtHmL6J-1muqs3JoXm3ak27lI_lvLHfgE3lnKA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>25782967</pqid></control><display><type>article</type><title>On an index policy for restless bandits</title><source>JSTOR</source><source>JSTOR Mathematics &amp; Business</source><creator>Weber, Richard R. ; Weiss, Gideon</creator><creatorcontrib>Weber, Richard R. ; Weiss, Gideon</creatorcontrib><description>We investigate the optimal allocation of effort to a collection of n projects. The projects are ‘restless' in that the state of a project evolves in time, whether or not it is allocated effort. The evolution of the state of each project follows a Markov rule, but transitions and rewards depend on whether or not the project receives effort. The objective is to maximize the expected time-average reward under a constraint that exactly m of the n projects receive effort at any one time. We show that as m and n tend to ∞ with m/n fixed, the per-project reward of the optimal policy is asymptotically the same as that achieved by a policy which operates under the relaxed constraint that an average of m projects be active. The relaxed constraint was considered by Whittle (1988) who described how to use a Lagrangian multiplier approach to assign indices to the projects. He conjectured that the policy of allocating effort to the m projects of greatest index is asymptotically optimal as m and n tend to∞. We show that the conjecture is true if the differential equation describing the fluid approximation to the index policy has a globally stable equilibrium point. This need not be the case, and we present an example for which the index policy is not asymptotically optimal. However, numerical work suggests that such counterexamples are extremely rare and that the size of the suboptimality which one might expect is minuscule.</description><identifier>ISSN: 0021-9002</identifier><identifier>EISSN: 1475-6072</identifier><identifier>DOI: 10.2307/3214547</identifier><identifier>CODEN: JPRBAM</identifier><language>eng</language><publisher>Cambridge, UK: Cambridge University Press</publisher><subject>Applications ; Approximation ; Biology, psychology, social sciences ; Counterexamples ; Differential equations ; Eigenvalues ; Exact sciences and technology ; Insurance, economics, finance ; Limit cycles ; Markov processes ; Markovs principle ; Mathematics ; Medical sciences ; Optimal policy ; Probability and statistics ; Reliability, life testing, quality control ; Research Papers ; Sciences and techniques of general use ; Statistics ; Subsidies ; Sufficient conditions</subject><ispartof>Journal of applied probability, 1990-09, Vol.27 (3), p.637-648</ispartof><rights>Copyright © Applied Probability Trust 1990</rights><rights>Copyright 1990 Applied Probability Trust</rights><rights>1993 INIST-CNRS</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c309t-43109defbb680ec503f262ee4bca759fe5529397bc682159c4c0c089025f3b9d3</citedby><cites>FETCH-LOGICAL-c309t-43109defbb680ec503f262ee4bca759fe5529397bc682159c4c0c089025f3b9d3</cites></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://www.jstor.org/stable/pdf/3214547$$EPDF$$P50$$Gjstor$$H</linktopdf><linktohtml>$$Uhttps://www.jstor.org/stable/3214547$$EHTML$$P50$$Gjstor$$H</linktohtml><link.rule.ids>314,780,784,803,832,27924,27925,58017,58021,58250,58254</link.rule.ids><backlink>$$Uhttp://pascal-francis.inist.fr/vibad/index.php?action=getRecordDetail&amp;idt=4427884$$DView record in Pascal Francis$$Hfree_for_read</backlink></links><search><creatorcontrib>Weber, Richard R.</creatorcontrib><creatorcontrib>Weiss, Gideon</creatorcontrib><title>On an index policy for restless bandits</title><title>Journal of applied probability</title><addtitle>Journal of Applied Probability</addtitle><description>We investigate the optimal allocation of effort to a collection of n projects. The projects are ‘restless' in that the state of a project evolves in time, whether or not it is allocated effort. The evolution of the state of each project follows a Markov rule, but transitions and rewards depend on whether or not the project receives effort. The objective is to maximize the expected time-average reward under a constraint that exactly m of the n projects receive effort at any one time. We show that as m and n tend to ∞ with m/n fixed, the per-project reward of the optimal policy is asymptotically the same as that achieved by a policy which operates under the relaxed constraint that an average of m projects be active. The relaxed constraint was considered by Whittle (1988) who described how to use a Lagrangian multiplier approach to assign indices to the projects. He conjectured that the policy of allocating effort to the m projects of greatest index is asymptotically optimal as m and n tend to∞. We show that the conjecture is true if the differential equation describing the fluid approximation to the index policy has a globally stable equilibrium point. This need not be the case, and we present an example for which the index policy is not asymptotically optimal. However, numerical work suggests that such counterexamples are extremely rare and that the size of the suboptimality which one might expect is minuscule.</description><subject>Applications</subject><subject>Approximation</subject><subject>Biology, psychology, social sciences</subject><subject>Counterexamples</subject><subject>Differential equations</subject><subject>Eigenvalues</subject><subject>Exact sciences and technology</subject><subject>Insurance, economics, finance</subject><subject>Limit cycles</subject><subject>Markov processes</subject><subject>Markovs principle</subject><subject>Mathematics</subject><subject>Medical sciences</subject><subject>Optimal policy</subject><subject>Probability and statistics</subject><subject>Reliability, life testing, quality control</subject><subject>Research Papers</subject><subject>Sciences and techniques of general use</subject><subject>Statistics</subject><subject>Subsidies</subject><subject>Sufficient conditions</subject><issn>0021-9002</issn><issn>1475-6072</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>1990</creationdate><recordtype>article</recordtype><recordid>eNp90E1LxDAQBuAgCq6r-Bd6EBcP1clX0xxl8QsW9qLnkKSJpHSbNemC---tbNGD4GXm8vDO8CJ0ieGWUBB3lGDGmThCM8wELysQ5BjNAAgu5ThP0VnOLcCIpJihxbovdF-EvnGfxTZ2we4LH1ORXB46l3NhdN-EIZ-jE6-77C6mPUdvjw-vy-dytX56Wd6vSktBDiWjGGTjvDFVDc5yoJ5UxDlmrBZcesc5kVQKY6uaYC4ts2ChlkC4p0Y2dI6uD7nbFD924xNqE7J1Xad7F3dZES5qIisxwsUB2hRzTs6rbQobnfYKg_ouQk1FjPJqitTZ6s4n3duQfzhjRNQ1-2VtHmL6J-1muqs3JoXm3ak27lI_lvLHfgE3lnKA</recordid><startdate>19900901</startdate><enddate>19900901</enddate><creator>Weber, Richard R.</creator><creator>Weiss, Gideon</creator><general>Cambridge University Press</general><general>Applied Probability Trust</general><scope>IQODW</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope></search><sort><creationdate>19900901</creationdate><title>On an index policy for restless bandits</title><author>Weber, Richard R. ; Weiss, Gideon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c309t-43109defbb680ec503f262ee4bca759fe5529397bc682159c4c0c089025f3b9d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>1990</creationdate><topic>Applications</topic><topic>Approximation</topic><topic>Biology, psychology, social sciences</topic><topic>Counterexamples</topic><topic>Differential equations</topic><topic>Eigenvalues</topic><topic>Exact sciences and technology</topic><topic>Insurance, economics, finance</topic><topic>Limit cycles</topic><topic>Markov processes</topic><topic>Markovs principle</topic><topic>Mathematics</topic><topic>Medical sciences</topic><topic>Optimal policy</topic><topic>Probability and statistics</topic><topic>Reliability, life testing, quality control</topic><topic>Research Papers</topic><topic>Sciences and techniques of general use</topic><topic>Statistics</topic><topic>Subsidies</topic><topic>Sufficient conditions</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Weber, Richard R.</creatorcontrib><creatorcontrib>Weiss, Gideon</creatorcontrib><collection>Pascal-Francis</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>Journal of applied probability</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Weber, Richard R.</au><au>Weiss, Gideon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>On an index policy for restless bandits</atitle><jtitle>Journal of applied probability</jtitle><addtitle>Journal of Applied Probability</addtitle><date>1990-09-01</date><risdate>1990</risdate><volume>27</volume><issue>3</issue><spage>637</spage><epage>648</epage><pages>637-648</pages><issn>0021-9002</issn><eissn>1475-6072</eissn><coden>JPRBAM</coden><abstract>We investigate the optimal allocation of effort to a collection of n projects. The projects are ‘restless' in that the state of a project evolves in time, whether or not it is allocated effort. The evolution of the state of each project follows a Markov rule, but transitions and rewards depend on whether or not the project receives effort. The objective is to maximize the expected time-average reward under a constraint that exactly m of the n projects receive effort at any one time. We show that as m and n tend to ∞ with m/n fixed, the per-project reward of the optimal policy is asymptotically the same as that achieved by a policy which operates under the relaxed constraint that an average of m projects be active. The relaxed constraint was considered by Whittle (1988) who described how to use a Lagrangian multiplier approach to assign indices to the projects. He conjectured that the policy of allocating effort to the m projects of greatest index is asymptotically optimal as m and n tend to∞. We show that the conjecture is true if the differential equation describing the fluid approximation to the index policy has a globally stable equilibrium point. This need not be the case, and we present an example for which the index policy is not asymptotically optimal. However, numerical work suggests that such counterexamples are extremely rare and that the size of the suboptimality which one might expect is minuscule.</abstract><cop>Cambridge, UK</cop><pub>Cambridge University Press</pub><doi>10.2307/3214547</doi><tpages>12</tpages></addata></record>
fulltext fulltext
identifier ISSN: 0021-9002
ispartof Journal of applied probability, 1990-09, Vol.27 (3), p.637-648
issn 0021-9002
1475-6072
language eng
recordid cdi_proquest_miscellaneous_25782967
source JSTOR; JSTOR Mathematics & Business
subjects Applications
Approximation
Biology, psychology, social sciences
Counterexamples
Differential equations
Eigenvalues
Exact sciences and technology
Insurance, economics, finance
Limit cycles
Markov processes
Markovs principle
Mathematics
Medical sciences
Optimal policy
Probability and statistics
Reliability, life testing, quality control
Research Papers
Sciences and techniques of general use
Statistics
Subsidies
Sufficient conditions
title On an index policy for restless bandits
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-01T06%3A28%3A14IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-jstor_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=On%20an%20index%20policy%20for%20restless%20bandits&rft.jtitle=Journal%20of%20applied%20probability&rft.au=Weber,%20Richard%20R.&rft.date=1990-09-01&rft.volume=27&rft.issue=3&rft.spage=637&rft.epage=648&rft.pages=637-648&rft.issn=0021-9002&rft.eissn=1475-6072&rft.coden=JPRBAM&rft_id=info:doi/10.2307/3214547&rft_dat=%3Cjstor_proqu%3E3214547%3C/jstor_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=25782967&rft_id=info:pmid/&rft_cupid=10_2307_3214547&rft_jstor_id=3214547&rfr_iscdi=true