Efficient Episodic Memory Utilization of Cooperative Multi-Agent Reinforcement Learning
In cooperative multi-agent reinforcement learning (MARL), agents aim to achieve a common goal, such as defeating enemies or scoring a goal. Existing MARL algorithms are effective but still require significant learning time and often get trapped in local optima by complex tasks, subsequently failing...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In cooperative multi-agent reinforcement learning (MARL), agents aim to
achieve a common goal, such as defeating enemies or scoring a goal. Existing
MARL algorithms are effective but still require significant learning time and
often get trapped in local optima by complex tasks, subsequently failing to
discover a goal-reaching policy. To address this, we introduce Efficient
episodic Memory Utilization (EMU) for MARL, with two primary objectives: (a)
accelerating reinforcement learning by leveraging semantically coherent memory
from an episodic buffer and (b) selectively promoting desirable transitions to
prevent local convergence. To achieve (a), EMU incorporates a trainable
encoder/decoder structure alongside MARL, creating coherent memory embeddings
that facilitate exploratory memory recall. To achieve (b), EMU introduces a
novel reward structure called episodic incentive based on the desirability of
states. This reward improves the TD target in Q-learning and acts as an
additional incentive for desirable transitions. We provide theoretical support
for the proposed incentive and demonstrate the effectiveness of EMU compared to
conventional episodic control. The proposed method is evaluated in StarCraft II
and Google Research Football, and empirical results indicate further
performance improvement over state-of-the-art methods. |
---|---|
DOI: | 10.48550/arxiv.2403.01112 |