Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost?
Based on differential privacy (DP) framework, we introduce and unify privacy definitions for the multi-armed bandit algorithms. We represent the framework with a unified graphical model and use it to connect privacy definitions. We derive and contrast lower bounds on the regret of bandit algorithms...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Based on differential privacy (DP) framework, we introduce and unify privacy
definitions for the multi-armed bandit algorithms. We represent the framework
with a unified graphical model and use it to connect privacy definitions. We
derive and contrast lower bounds on the regret of bandit algorithms satisfying
these definitions. We leverage a unified proving technique to achieve all the
lower bounds. We show that for all of them, the learner's regret is increased
by a multiplicative factor dependent on the privacy level $\epsilon$. We
observe that the dependency is weaker when we do not require local differential
privacy for the rewards. |
---|---|
DOI: | 10.48550/arxiv.1905.12298 |