ASYMPTOTICALLY OPTIMAL MULTI-ARMED BANDIT POLICIES UNDER A COST CONSTRAINT

We consider the multi-armed bandit problem under a cost constraint. Successive samples from each population are i.i.d. with unknown distribution and each sample incurs a known population-dependent cost. The objective is to design an adaptive sampling policy to maximize the expected sum of n samples...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Probability in the engineering and informational sciences 2017-07, Vol.31 (3), p.284-310
Hauptverfasser: Burnetas, Apostolos, Kanavetas, Odysseas, Katehakis, Michael N.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We consider the multi-armed bandit problem under a cost constraint. Successive samples from each population are i.i.d. with unknown distribution and each sample incurs a known population-dependent cost. The objective is to design an adaptive sampling policy to maximize the expected sum of n samples such that the average cost does not exceed a given bound sample-path wise. We establish an asymptotic lower bound for the regret of feasible uniformly fast convergent policies, and construct a class of policies, which achieve the bound. We also provide their explicit form under Normal distributions with unknown means and known variances.
ISSN:0269-9648
1469-8951
DOI:10.1017/S026996481600036X