Offline Multi-Agent Reinforcement Learning with Coupled Value Factorization
Offline reinforcement learning (RL) that learns policies from offline datasets without environment interaction has received considerable attention in recent years. Compared with the rich literature in the single-agent case, offline multi-agent RL is still a relatively underexplored area. Most existi...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Offline reinforcement learning (RL) that learns policies from offline
datasets without environment interaction has received considerable attention in
recent years. Compared with the rich literature in the single-agent case,
offline multi-agent RL is still a relatively underexplored area. Most existing
methods directly apply offline RL ingredients in the multi-agent setting
without fully leveraging the decomposable problem structure, leading to less
satisfactory performance in complex tasks. We present OMAC, a new offline
multi-agent RL algorithm with coupled value factorization. OMAC adopts a
coupled value factorization scheme that decomposes the global value function
into local and shared components, and also maintains the credit assignment
consistency between the state-value and Q-value functions. Moreover, OMAC
performs in-sample learning on the decomposed local state-value functions,
which implicitly conducts max-Q operation at the local level while avoiding
distributional shift caused by evaluating out-of-distribution actions. Based on
the comprehensive evaluations of the offline multi-agent StarCraft II
micro-management tasks, we demonstrate the superior performance of OMAC over
the state-of-the-art offline multi-agent RL methods. |
---|---|
DOI: | 10.48550/arxiv.2306.08900 |