A/B Testing and Best-arm Identification for Linear Bandits with Robustness to Non-stationarity
We investigate the fixed-budget best-arm identification (BAI) problem for linear bandits in a potentially non-stationary environment. Given a finite arm set $\mathcal{X}\subset\mathbb{R}^d$, a fixed budget $T$, and an unpredictable sequence of parameters $\left\lbrace\theta_t\right\rbrace_{t=1}^{T}$...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We investigate the fixed-budget best-arm identification (BAI) problem for
linear bandits in a potentially non-stationary environment. Given a finite arm
set $\mathcal{X}\subset\mathbb{R}^d$, a fixed budget $T$, and an unpredictable
sequence of parameters $\left\lbrace\theta_t\right\rbrace_{t=1}^{T}$, an
algorithm will aim to correctly identify the best arm $x^* :=
\arg\max_{x\in\mathcal{X}}x^\top\sum_{t=1}^{T}\theta_t$ with probability as
high as possible. Prior work has addressed the stationary setting where
$\theta_t = \theta_1$ for all $t$ and demonstrated that the error probability
decreases as $\exp(-T /\rho^*)$ for a problem-dependent constant $\rho^*$. But
in many real-world $A/B/n$ multivariate testing scenarios that motivate our
work, the environment is non-stationary and an algorithm expecting a stationary
setting can easily fail. For robust identification, it is well-known that if
arms are chosen randomly and non-adaptively from a G-optimal design over
$\mathcal{X}$ at each time then the error probability decreases as
$\exp(-T\Delta^2_{(1)}/d)$, where $\Delta_{(1)} = \min_{x \neq x^*} (x^* -
x)^\top \frac{1}{T}\sum_{t=1}^T \theta_t$. As there exist environments where
$\Delta_{(1)}^2/ d \ll 1/ \rho^*$, we are motivated to propose a novel
algorithm $\mathsf{P1}$-$\mathsf{RAGE}$ that aims to obtain the best of both
worlds: robustness to non-stationarity and fast rates of identification in
benign settings. We characterize the error probability of
$\mathsf{P1}$-$\mathsf{RAGE}$ and demonstrate empirically that the algorithm
indeed never performs worse than G-optimal design but compares favorably to the
best algorithms in the stationary setting. |
---|---|
DOI: | 10.48550/arxiv.2307.15154 |