Towards Optimal Algorithms for Multi-Player Bandits without Collision Sensing Information
We propose a novel algorithm for multi-player multi-armed bandits without collision sensing information. Our algorithm circumvents two problems shared by all state-of-the-art algorithms: it does not need as an input a lower bound on the minimal expected reward of an arm, and its performance does not...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We propose a novel algorithm for multi-player multi-armed bandits without
collision sensing information. Our algorithm circumvents two problems shared by
all state-of-the-art algorithms: it does not need as an input a lower bound on
the minimal expected reward of an arm, and its performance does not scale
inversely proportionally to the minimal expected reward. We prove a theoretical
regret upper bound to justify these claims. We complement our theoretical
results with numerical experiments, showing that the proposed algorithm
outperforms state-of-the-art in practice as well. |
---|---|
DOI: | 10.48550/arxiv.2103.13059 |