Collaborative multi-agent reinforcement learning based on a novel coordination tree frame with dynamic partition

In the research of team Markov games, computing the coordinate team dynamically and determining the joint action policy are the main problems. To deal with the first problem, a dynamic team partitioning method is proposed based on a novel coordinate tree frame. We build a coordinate tree with coordi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering applications of artificial intelligence 2014-01, Vol.27, p.191-198
Hauptverfasser: Fang, Min, Groen, Frans C.A., Li, Hao, Zhang, Jujie
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In the research of team Markov games, computing the coordinate team dynamically and determining the joint action policy are the main problems. To deal with the first problem, a dynamic team partitioning method is proposed based on a novel coordinate tree frame. We build a coordinate tree with coordinate agent subset and define two breaching weights to represent the weights of an agent to corporate with the agent subset. Each agent chooses the agent subset with a minimum cost as the coordinate team based on coordinate tree. The Q-learning based on belief allocation studies multi-agents joint action policy which helps corporative multi-agents joint action policy to converge to the optimum solution. We perform experiments on multiple simulation environments and compare the proposed algorithm with similar ones. Experimental results show that the proposed algorithms are able to dynamically compute the corporative teams and design the optimum joint action policy for corporative teams. •We present a cooperation tree-structure by using the subset of cooperation agents as the nodes of a tree.•Two kind of weights are defined which describe the cost of an agent collaborating with or without an agent subset respectively.•Each agent calculates its collaborative agent subset with a minimal cost based on coordination trees.
ISSN:0952-1976
1873-6769
DOI:10.1016/j.engappai.2013.09.001