Thompson Sampling for Dynamic Multi-armed Bandits

The importance of multi-armed bandit (MAB) problems is on the rise due to their recent application in a large variety of areas such as online advertising, news article selection, wireless networks, and medicinal trials, to name a few. The most common assumption made when solving such MAB problems is...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Gupta, N., Granmo, O-C, Agrawala, A.
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Bayesian methods Bayesian Techniques Dynamics Educational institutions Electronic mail Heuristic algorithms Learning Algorithms Multi-Armed Bandits Random variables
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The importance of multi-armed bandit (MAB) problems is on the rise due to their recent application in a large variety of areas such as online advertising, news article selection, wireless networks, and medicinal trials, to name a few. The most common assumption made when solving such MAB problems is that the unknown reward probability θ k of each bandit arm k is fixed. However, this assumption rarely holds in practice simply because real-life problems often involve underlying processes that are dynamically evolving. In this paper, we model problems where reward probabilities θ k are drifting, and introduce a new method called Dynamic Thompson Sampling (DTS) that facilitates Order Statistics based Thompson Sampling for these dynamically evolving MABs. The DTS algorithm adapts its success probability estimates, hat θ k , faster than traditional Thompson Sampling schemes and thus leads to improved performance in terms of lower regret. Extensive experiments demonstrate that DTS outperforms current state-of-the-art approaches, namely pure Thompson Sampling, UCB-Normal and UCB f , for the case of dynamic reward probabilities. Furthermore, this performance advantage increases persistently with the number of bandit arms.
DOI:	10.1109/ICMLA.2011.144