THE MULTI-ARMED BANDIT PROBLEM: AN EFFICIENT NONPARAMETRIC SOLUTION

Lai and Robbins (Adv. in Appl. Math. 6 (1985) 4–22) and Lai (Ann. Statist. 15 (1987) 1091–1114) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kull...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Annals of statistics 2020-02, Vol.48 (1), p.346-373
1. Verfasser: Chan, Hock Peng
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Lai and Robbins (Adv. in Appl. Math. 6 (1985) 4–22) and Lai (Ann. Statist. 15 (1987) 1091–1114) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback–Leibler information of the reward distributions, estimated from specified parametric families. In recent years, there has been renewed interest in the multi-armed bandit problem due to new applications in machine learning algorithms and data analytics. Nonparametric arm allocation procedures like ϵ-greedy, Boltzmann exploration and BESA were studied, and modified versions of the UCB procedure were also analyzed under nonparametric settings. However, unlike UCB these nonparametric procedures are not efficient under general parametric settings. In this paper, we propose efficient nonparametric procedures.
ISSN:0090-5364
2168-8966
DOI:10.1214/19-aos1809