Collaborative Linear Bandits with Adversarial Agents: Near-Optimal Regret Bounds
We consider a linear stochastic bandit problem involving $M$ agents that can collaborate via a central server to minimize regret. A fraction $\alpha$ of these agents are adversarial and can act arbitrarily, leading to the following tension: while collaboration can potentially reduce regret, it can a...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We consider a linear stochastic bandit problem involving $M$ agents that can
collaborate via a central server to minimize regret. A fraction $\alpha$ of
these agents are adversarial and can act arbitrarily, leading to the following
tension: while collaboration can potentially reduce regret, it can also disrupt
the process of learning due to adversaries. In this work, we provide a
fundamental understanding of this tension by designing new algorithms that
balance the exploration-exploitation trade-off via carefully constructed robust
confidence intervals. We also complement our algorithms with tight analyses.
First, we develop a robust collaborative phased elimination algorithm that
achieves $\tilde{O}\left(\alpha+ 1/\sqrt{M}\right) \sqrt{dT}$ regret for each
good agent; here, $d$ is the model-dimension and $T$ is the horizon. For small
$\alpha$, our result thus reveals a clear benefit of collaboration despite
adversaries. Using an information-theoretic argument, we then prove a matching
lower bound, thereby providing the first set of tight, near-optimal regret
bounds for collaborative linear bandits with adversaries. Furthermore, by
leveraging recent advances in high-dimensional robust statistics, we
significantly extend our algorithmic ideas and results to (i) the generalized
linear bandit model that allows for non-linear observation maps; and (ii) the
contextual bandit setting that allows for time-varying feature vectors. |
---|---|
DOI: | 10.48550/arxiv.2206.02834 |