Regret Minimization in Heavy-Tailed Bandits
We revisit the classic regret-minimization problem in the stochastic multi-armed bandit setting when the arm-distributions are allowed to be heavy-tailed. Regret minimization has been well studied in simpler settings of either bounded support reward distributions or distributions that belong to a si...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We revisit the classic regret-minimization problem in the stochastic
multi-armed bandit setting when the arm-distributions are allowed to be
heavy-tailed. Regret minimization has been well studied in simpler settings of
either bounded support reward distributions or distributions that belong to a
single parameter exponential family. We work under the much weaker assumption
that the moments of order $(1+\epsilon)$ are uniformly bounded by a known
constant B, for some given $\epsilon > 0$. We propose an optimal algorithm that
matches the lower bound exactly in the first-order term. We also give a
finite-time bound on its regret. We show that our index concentrates faster
than the well known truncated or trimmed empirical mean estimators for the mean
of heavy-tailed distributions. Computing our index can be computationally
demanding. To address this, we develop a batch-based algorithm that is optimal
up to a multiplicative constant depending on the batch size. We hence provide a
controlled trade-off between statistical optimality and computational cost. |
---|---|
DOI: | 10.48550/arxiv.2102.03734 |