Multi-armed bandit with sub-exponential rewards
In this work we consider a general class of multi-armed bandits (MAB) problems with sub-exponential rewards. This is primarily motivated by service systems with exponential inter-arrival and service distributions. It is well-known that the celebrated Upper Confidence Bound (UCB) algorithm can achiev...
Gespeichert in:
Veröffentlicht in: | Operations research letters 2021-08, Vol.49 (5) |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this work we consider a general class of multi-armed bandits (MAB) problems with sub-exponential rewards. This is primarily motivated by service systems with exponential inter-arrival and service distributions. It is well-known that the celebrated Upper Confidence Bound (UCB) algorithm can achieve tight regret bound for MAB under sub-Gaussian rewards. There has been subsequent work by Bubeck et al. (2013) [4] extending this tightness result to any reward distributions with finite variance by leveraging robust mean estimators. In this paper, we present three alternative UCB based algorithms, termed UCB-Rad, UCB-Warm, and UCB-Hybrid, specifically for MAB with sub-exponential rewards. While not being the first to achieve tight regret bounds, these algorithms are conceptually simpler and provide a more explicit analysis for this problem. Moreover, we present a rental bike revenue management application and conduct numerical experiments. We find that UCB-Warm and UCB-Hybrid outperform UCB-Rad in our computational experiments. |
---|---|
ISSN: | 0167-6377 1872-7468 |