Optimal Policy for Bernoulli Bandits: Computation and Algorithm Gauge

Bernoulli multi-armed bandits are a reinforcement learning model used to study a variety of choice optimization problems. Often such optimizations concern a finite-time horizon. In principle, statistically optimal policies can be computed via dynamic programming, but doing so is considered infeasibl...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on artificial intelligence 2021-02, Vol.2 (1), p.2-17
Hauptverfasser:	Pilarski, Sebastian, Pilarski, Slawomir, Varro, Daniel
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial intelligence Clinical trials Computational modeling epsilon-greedy Games Gittins index (GI) multi-armed Bernoulli bandits optimal policy (OPT) POKER Random variables Statistics Testing Thompson sampling (TS) upper-confidence bound (UCB) Whittle index (WI)
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!