QTRAN++: Improved Value Transformation for Cooperative Multi-Agent Reinforcement Learning
QTRAN is a multi-agent reinforcement learning (MARL) algorithm capable of learning the largest class of joint-action value functions up to date. However, despite its strong theoretical guarantee, it has shown poor empirical performance in complex environments, such as Starcraft Multi-Agent Challenge...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | QTRAN is a multi-agent reinforcement learning (MARL) algorithm capable of
learning the largest class of joint-action value functions up to date. However,
despite its strong theoretical guarantee, it has shown poor empirical
performance in complex environments, such as Starcraft Multi-Agent Challenge
(SMAC). In this paper, we identify the performance bottleneck of QTRAN and
propose a substantially improved version, coined QTRAN++. Our gains come from
(i) stabilizing the training objective of QTRAN, (ii) removing the strict role
separation between the action-value estimators of QTRAN, and (iii) introducing
a multi-head mixing network for value transformation. Through extensive
evaluation, we confirm that our diagnosis is correct, and QTRAN++ successfully
bridges the gap between empirical performance and theoretical guarantee. In
particular, QTRAN++ newly achieves state-of-the-art performance in the SMAC
environment. The code will be released. |
---|---|
DOI: | 10.48550/arxiv.2006.12010 |