Sub-sampling for Efficient Non-Parametric Bandit Exploration
In this paper we propose the first multi-armed bandit algorithm based on re-sampling that achieves asymptotically optimal regret simultaneously for different families of arms (namely Bernoulli, Gaussian and Poisson distributions). Unlike Thompson Sampling which requires to specify a different prior...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this paper we propose the first multi-armed bandit algorithm based on
re-sampling that achieves asymptotically optimal regret simultaneously for
different families of arms (namely Bernoulli, Gaussian and Poisson
distributions). Unlike Thompson Sampling which requires to specify a different
prior to be optimal in each case, our proposal RB-SDA does not need any
distribution-dependent tuning. RB-SDA belongs to the family of Sub-sampling
Duelling Algorithms (SDA) which combines the sub-sampling idea first used by
the BESA [1] and SSMC [2] algorithms with different sub-sampling schemes. In
particular, RB-SDA uses Random Block sampling. We perform an experimental study
assessing the flexibility and robustness of this promising novel approach for
exploration in bandit models. |
---|---|
DOI: | 10.48550/arxiv.2010.14323 |