Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost?

Based on differential privacy (DP) framework, we introduce and unify privacy definitions for the multi-armed bandit algorithms. We represent the framework with a unified graphical model and use it to connect privacy definitions. We derive and contrast lower bounds on the regret of bandit algorithms...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2020-06
Hauptverfasser:	Basu, Debabrota, Dimitrakakis, Christos, Tossou, Aristide
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Dependence Graphical representations Lower bounds Multi-armed bandit problems Privacy
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Based on differential privacy (DP) framework, we introduce and unify privacy definitions for the multi-armed bandit algorithms. We represent the framework with a unified graphical model and use it to connect privacy definitions. We derive and contrast lower bounds on the regret of bandit algorithms satisfying these definitions. We leverage a unified proving technique to achieve all the lower bounds. We show that for all of them, the learner's regret is increased by a multiplicative factor dependent on the privacy level \(\epsilon\). We observe that the dependency is weaker when we do not require local differential privacy for the rewards.
ISSN:	2331-8422