Tight Memory-Regret Lower Bounds for Streaming Bandits
In this paper, we investigate the streaming bandits problem, wherein the learner aims to minimize regret by dealing with online arriving arms and sublinear arm memory. We establish the tight worst-case regret lower bound of $\Omega \left( (TB)^{\alpha} K^{1-\alpha}\right), \alpha = 2^{B} / (2^{B+1}-...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper, we investigate the streaming bandits problem, wherein the
learner aims to minimize regret by dealing with online arriving arms and
sublinear arm memory. We establish the tight worst-case regret lower bound of
$\Omega \left( (TB)^{\alpha} K^{1-\alpha}\right), \alpha = 2^{B} / (2^{B+1}-1)$
for any algorithm with a time horizon $T$, number of arms $K$, and number of
passes $B$. The result reveals a separation between the stochastic bandits
problem in the classical centralized setting and the streaming setting with
bounded arm memory. Notably, in comparison to the well-known
$\Omega(\sqrt{KT})$ lower bound, an additional double logarithmic factor is
unavoidable for any streaming bandits algorithm with sublinear memory
permitted. Furthermore, we establish the first instance-dependent lower bound
of $\Omega \left(T^{1/(B+1)} \sum_{\Delta_x>0} \frac{\mu^*}{\Delta_x}\right)$
for streaming bandits. These lower bounds are derived through a unique
reduction from the regret-minimization setting to the sample complexity
analysis for a sequence of $\epsilon$-optimal arms identification tasks, which
maybe of independent interest. To complement the lower bound, we also provide a
multi-pass algorithm that achieves a regret upper bound of $\tilde{O} \left(
(TB)^{\alpha} K^{1 - \alpha}\right)$ using constant arm memory. |
---|---|
DOI: | 10.48550/arxiv.2306.07903 |