Differential Privacy for Multi-armed Bandits: What Is It and What Is Its Cost?

Based on differential privacy (DP) framework, we introduce and unify privacy definitions for the multi-armed bandit algorithms. We represent the framework with a unified graphical model and use it to connect privacy definitions. We derive and contrast lower bounds on the regret of bandit algorithms...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Basu, Debabrota, Dimitrakakis, Christos, Tossou, Aristide
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Based on differential privacy (DP) framework, we introduce and unify privacy definitions for the multi-armed bandit algorithms. We represent the framework with a unified graphical model and use it to connect privacy definitions. We derive and contrast lower bounds on the regret of bandit algorithms satisfying these definitions. We leverage a unified proving technique to achieve all the lower bounds. We show that for all of them, the learner's regret is increased by a multiplicative factor dependent on the privacy level $\epsilon$. We observe that the dependency is weaker when we do not require local differential privacy for the rewards.
DOI:10.48550/arxiv.1905.12298