Transform networks for cooperative multi-agent deep reinforcement learning

In recent years, the Multi-agent Deep Reinforcement Learning Algorithm has been developing rapidly, in which the value method-based algorithm plays an important role (such as Monotonic Value Function Factorisation (QMIX) and Learning to Factorize with Transformation for Cooperative Multi-Agent Reinf...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied intelligence (Dordrecht, Netherlands) Netherlands), 2023-04, Vol.53 (8), p.9261-9269
Hauptverfasser: Wang, Hongbin, Xie, Xiaodong, Zhou, Lianke
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In recent years, the Multi-agent Deep Reinforcement Learning Algorithm has been developing rapidly, in which the value method-based algorithm plays an important role (such as Monotonic Value Function Factorisation (QMIX) and Learning to Factorize with Transformation for Cooperative Multi-Agent Reinforcement learning (QTRAN)). In spite of the fact, the performance of current value-based multi-agent algorithm under complex scene still can be further improved. In value function-based model, a mixing network is usually used to mix the local action value of each agent to get joint action value when the partial observability will cause the problem of misalignment and unsatisfying mixing results. This paper proposes a multi-agent model called Transform Networks that transform the individual local action-value function gotten by agent network to individual global action-value function, which will avoid the problem of misalignment caused by partial observability when the individual action value is mixed, and the joint action value can represent the cooperative conditions of all agents well. Using the StarCraft Multi-Agent Challenge (SMAC) as the experimental platform, the comparison of the performance of algorithms on five different maps proved that the proposed method has better effect than the current most advanced baseline algorithms.
ISSN:0924-669X
1573-7497
DOI:10.1007/s10489-022-03924-3