Preserving Expert-Level Privacy in Offline Reinforcement Learning
The offline reinforcement learning (RL) problem aims to learn an optimal policy from historical data collected by one or more behavioural policies (experts) by interacting with an environment. However, the individual experts may be privacy-sensitive in that the learnt policy may retain information a...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The offline reinforcement learning (RL) problem aims to learn an optimal
policy from historical data collected by one or more behavioural policies
(experts) by interacting with an environment. However, the individual experts
may be privacy-sensitive in that the learnt policy may retain information about
their precise choices. In some domains like personalized retrieval, advertising
and healthcare, the expert choices are considered sensitive data. To provably
protect the privacy of such experts, we propose a novel consensus-based
expert-level differentially private offline RL training approach compatible
with any existing offline RL algorithm. We prove rigorous differential privacy
guarantees, while maintaining strong empirical performance. Unlike existing
work in differentially private RL, we supplement the theory with
proof-of-concept experiments on classic RL environments featuring large
continuous state spaces, demonstrating substantial improvements over a natural
baseline across multiple tasks. |
---|---|
DOI: | 10.48550/arxiv.2411.13598 |