Gamification of Pure Exploration for Linear Bandits
We investigate an active pure-exploration setting, that includes best-arm identification, in the context of linear stochastic bandits. While asymptotically optimal algorithms exist for standard multi-arm bandits, the existence of such algorithms for the best-arm identification in linear bandits has...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We investigate an active pure-exploration setting, that includes best-arm
identification, in the context of linear stochastic bandits. While
asymptotically optimal algorithms exist for standard multi-arm bandits, the
existence of such algorithms for the best-arm identification in linear bandits
has been elusive despite several attempts to address it. First, we provide a
thorough comparison and new insight over different notions of optimality in the
linear case, including G-optimality, transductive optimality from optimal
experimental design and asymptotic optimality. Second, we design the first
asymptotically optimal algorithm for fixed-confidence pure exploration in
linear bandits. As a consequence, our algorithm naturally bypasses the pitfall
caused by a simple but difficult instance, that most prior algorithms had to be
engineered to deal with explicitly. Finally, we avoid the need to fully solve
an optimal design problem by providing an approach that entails an efficient
implementation. |
---|---|
DOI: | 10.48550/arxiv.2007.00953 |