Stochastic Multi-Armed Bandits with Unrestricted Delay Distributions
We study the stochastic Multi-Armed Bandit (MAB) problem with random delays in the feedback received by the algorithm. We consider two settings: the reward-dependent delay setting, where realized delays may depend on the stochastic rewards, and the reward-independent delay setting. Our main contribu...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We study the stochastic Multi-Armed Bandit (MAB) problem with random delays
in the feedback received by the algorithm. We consider two settings: the
reward-dependent delay setting, where realized delays may depend on the
stochastic rewards, and the reward-independent delay setting. Our main
contribution is algorithms that achieve near-optimal regret in each of the
settings, with an additional additive dependence on the quantiles of the delay
distribution. Our results do not make any assumptions on the delay
distributions: in particular, we do not assume they come from any parametric
family of distributions and allow for unbounded support and expectation; we
further allow for infinite delays where the algorithm might occasionally not
observe any feedback. |
---|---|
DOI: | 10.48550/arxiv.2106.02436 |