Optimal Adaptive Policies for Sequential Allocation Problems

Consider the problem of sequential sampling frommstatistical populations to maximize the expected sum of outcomes in the long run. Under suitable assumptions on the unknown parameters[formula], it is shown that there exists a classCRof adaptive policies with the following properties: (i) The expecte...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Advances in applied mathematics 1996-06, Vol.17 (2), p.122-142
Hauptverfasser: Burnetas, Apostolos N., Katehakis, Michael N.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Consider the problem of sequential sampling frommstatistical populations to maximize the expected sum of outcomes in the long run. Under suitable assumptions on the unknown parameters[formula], it is shown that there exists a classCRof adaptive policies with the following properties: (i) The expectednhorizon reward[formula]under any policy π0inCRis equal to[formula], asn→∞, where[formula]is the largest population mean and[formula]is a constant. (ii) Policies inCRare asymptotically optimal within a larger classCUFof “uniformly fast convergent” policies in the sense that[formula], for any π∈CUFand any[formula]such that[formula]. Policies inCRare specified via easily computable indices, defined as unique solutions to dual problems that arise naturally from the functional form of[formula]. In addition, the assumptions are verified for populations specified by nonparametric discrete univariate distributions with finite support. In the case of normal populations with unknown means and variances, we leave as an open problem the verification of one assumption.
ISSN:0196-8858
1090-2074
DOI:10.1006/aama.1996.0007