Fixed-Budget Best-Arm Identification in Sparse Linear Bandits
We study the best-arm identification problem in sparse linear bandits under the fixed-budget setting. In sparse linear bandits, the unknown feature vector $\theta^*$ may be of large dimension $d$, but only a few, say $s \ll d$ of these features have non-zero values. We design a two-phase algorithm,...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We study the best-arm identification problem in sparse linear bandits under
the fixed-budget setting. In sparse linear bandits, the unknown feature vector
$\theta^*$ may be of large dimension $d$, but only a few, say $s \ll d$ of
these features have non-zero values. We design a two-phase algorithm, Lasso and
Optimal-Design- (Lasso-OD) based linear best-arm identification. The first
phase of Lasso-OD leverages the sparsity of the feature vector by applying the
thresholded Lasso introduced by Zhou (2009), which estimates the support of
$\theta^*$ correctly with high probability using rewards from the selected arms
and a judicious choice of the design matrix. The second phase of Lasso-OD
applies the OD-LinBAI algorithm by Yang and Tan (2022) on that estimated
support. We derive a non-asymptotic upper bound on the error probability of
Lasso-OD by carefully choosing hyperparameters (such as Lasso's regularization
parameter) and balancing the error probabilities of both phases. For fixed
sparsity $s$ and budget $T$, the exponent in the error probability of Lasso-OD
depends on $s$ but not on the dimension $d$, yielding a significant performance
improvement for sparse and high-dimensional linear bandits. Furthermore, we
show that Lasso-OD is almost minimax optimal in the exponent. Finally, we
provide numerical examples to demonstrate the significant performance
improvement over the existing algorithms for non-sparse linear bandits such as
OD-LinBAI, BayesGap, Peace, LinearExploration, and GSE. |
---|---|
DOI: | 10.48550/arxiv.2311.00481 |