Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds

We consider a linear stochastic bandit problem involving \(M\) agents that can collaborate via a central server to minimize regret. A fraction \(\alpha\) of these agents are adversarial and can act arbitrarily, leading to the following tension: while collaboration can potentially reduce regret, it c...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2022-06
Hauptverfasser:	Mitra, Aritra, Adibi, Arman, Pappas, George J, Hassani, Hamed
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Collaboration Confidence intervals Information theory Lower bounds Robustness
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We consider a linear stochastic bandit problem involving \(M\) agents that can collaborate via a central server to minimize regret. A fraction \(\alpha\) of these agents are adversarial and can act arbitrarily, leading to the following tension: while collaboration can potentially reduce regret, it can also disrupt the process of learning due to adversaries. In this work, we provide a fundamental understanding of this tension by designing new algorithms that balance the exploration-exploitation trade-off via carefully constructed robust confidence intervals. We also complement our algorithms with tight analyses. First, we develop a robust collaborative phased elimination algorithm that achieves \(\tilde{O}\left(\alpha+ 1/\sqrt{M}\right) \sqrt{dT}\) regret for each good agent; here, \(d\) is the model-dimension and \(T\) is the horizon. For small \(\alpha\), our result thus reveals a clear benefit of collaboration despite adversaries. Using an information-theoretic argument, we then prove a matching lower bound, thereby providing the first set of tight, near-optimal regret bounds for collaborative linear bandits with adversaries. Furthermore, by leveraging recent advances in high-dimensional robust statistics, we significantly extend our algorithmic ideas and results to (i) the generalized linear bandit model that allows for non-linear observation maps; and (ii) the contextual bandit setting that allows for time-varying feature vectors.
ISSN:	2331-8422