Proxy Experience Replay: Federated Distillation for Distributed Reinforcement Learning

Traditional distributed deep reinforcement learning (RL) commonly relies on exchanging the experience replay memory (RM) of each agent. Since the RM contains all state observations and action policy history, it may incur huge communication overhead while violating the privacy of each agent. Alternat...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE intelligent systems 2020-07, Vol.35 (4), p.94-101
Hauptverfasser:	Cha, Han, Park, Jihong, Kim, Hyesung, Bennis, Mehdi, Kim, Seong-Lyun
Format:	Artikel
Sprache:	eng
Schlagworte:	Ablation Algorithms Artificial neural networks Clustering Communication Completion time Computer architecture Computer simulation Deep learning Distillation Distributed Artificial Intelligence Intelligent systems Learning Learning (artificial intelligence) Machine learning Neural networks Open area test sites Payloads Privacy Reinforcement learning Servers Wireless Communication
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Traditional distributed deep reinforcement learning (RL) commonly relies on exchanging the experience replay memory (RM) of each agent. Since the RM contains all state observations and action policy history, it may incur huge communication overhead while violating the privacy of each agent. Alternatively, this article presents a communication-efficient and privacy-preserving distributed RL framework, coined federated reinforcement distillation (FRD). In FRD, each agent exchanges its proxy experience RM (ProxRM), in which policies are locally averaged with respect to proxy states clustering actual states. To provide FRD design insights, we present ablation studies on the impact of ProxRM structures, neural network architectures, and communication intervals. Furthermore, we propose an improved version of FRD, coined mixup augmented FRD (MixFRD), in which ProxRM is interpolated using the mixup data augmentation algorithm. Simulations in a Cartpole environment validate the effectiveness of MixFRD in reducing the variance of mission completion time and communication cost, compared to the benchmark schemes, vanilla FRD, federated RL (FRL), and policy distillation.
ISSN:	1541-1672 1941-1294
DOI:	10.1109/MIS.2020.2994942