Optimal Adaptive Policies for Sequential Allocation Problems
Consider the problem of sequential sampling frommstatistical populations to maximize the expected sum of outcomes in the long run. Under suitable assumptions on the unknown parameters[formula], it is shown that there exists a classCRof adaptive policies with the following properties: (i) The expecte...
Gespeichert in:
Veröffentlicht in: | Advances in applied mathematics 1996-06, Vol.17 (2), p.122-142 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Consider the problem of sequential sampling frommstatistical populations to maximize the expected sum of outcomes in the long run. Under suitable assumptions on the unknown parameters[formula], it is shown that there exists a classCRof adaptive policies with the following properties: (i) The expectednhorizon reward[formula]under any policy π0inCRis equal to[formula], asn→∞, where[formula]is the largest population mean and[formula]is a constant. (ii) Policies inCRare asymptotically optimal within a larger classCUFof “uniformly fast convergent” policies in the sense that[formula], for any π∈CUFand any[formula]such that[formula]. Policies inCRare specified via easily computable indices, defined as unique solutions to dual problems that arise naturally from the functional form of[formula]. In addition, the assumptions are verified for populations specified by nonparametric discrete univariate distributions with finite support. In the case of normal populations with unknown means and variances, we leave as an open problem the verification of one assumption. |
---|---|
ISSN: | 0196-8858 1090-2074 |
DOI: | 10.1006/aama.1996.0007 |