Multi-armed bandit with sub-exponential rewards

In this work we consider a general class of multi-armed bandits (MAB) problems with sub-exponential rewards. This is primarily motivated by service systems with exponential inter-arrival and service distributions. It is well-known that the celebrated Upper Confidence Bound (UCB) algorithm can achiev...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Operations research letters 2021-08, Vol.49 (5)
Hauptverfasser:	Jia, Huiwen, Shi, Cong, Shen, Siqian
Format:	Artikel
Sprache:	eng
Schlagworte:	MATHEMATICS AND COMPUTING multi-armed bandit sub-exponential reward unbounded reward upper confidence bound
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this work we consider a general class of multi-armed bandits (MAB) problems with sub-exponential rewards. This is primarily motivated by service systems with exponential inter-arrival and service distributions. It is well-known that the celebrated Upper Confidence Bound (UCB) algorithm can achieve tight regret bound for MAB under sub-Gaussian rewards. There has been subsequent work by Bubeck et al. (2013) [4] extending this tightness result to any reward distributions with finite variance by leveraging robust mean estimators. In this paper, we present three alternative UCB based algorithms, termed UCB-Rad, UCB-Warm, and UCB-Hybrid, specifically for MAB with sub-exponential rewards. While not being the first to achieve tight regret bounds, these algorithms are conceptually simpler and provide a more explicit analysis for this problem. Moreover, we present a rental bike revenue management application and conduct numerical experiments. We find that UCB-Warm and UCB-Hybrid outperform UCB-Rad in our computational experiments.
ISSN:	0167-6377 1872-7468