ASYMPTOTIC BAYES ANALYSIS FOR THE FINITE-HORIZON ONE-ARMED-BANDIT PROBLEM
The multiarmed-bandit problem is often taken as a basic model for the trade-off between the exploration and utilization required for efficient optimization under uncertainty. In this article, we study the situation in which the unknown performance of a new bandit is to be evaluated and compared with...
Gespeichert in:
Veröffentlicht in: | Probability in the engineering and informational sciences 2003-01, Vol.17 (1), p.53-82 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The multiarmed-bandit problem is often taken as a basic model
for the trade-off between the exploration and utilization required
for efficient optimization under uncertainty. In this article,
we study the situation in which the unknown performance of a
new bandit is to be evaluated and compared with that of a known
one over a finite horizon. We assume that the bandits represent
random variables with distributions from the one-parameter
exponential family. When the objective is to maximize the Bayes
expected sum of outcomes over a finite horizon, it is shown
that optimal policies tend to simple limits when the length
of the horizon is large. |
---|---|
ISSN: | 0269-9648 1469-8951 |
DOI: | 10.1017/S0269964803171045 |