Entropy Enhanced Multiagent Coordination Based on Hierarchical Graph Learning for Continuous Action Space

In most existing studies on large-scale multiagent coordination, the control methods aim to learn discrete policies for agents with finite choices. They rarely consider selecting actions directly from continuous action spaces to provide more accurate control; therefore, they are normally unsuitable...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on cognitive and developmental systems 2024-06, Vol.16 (3), p.1161-1171
Hauptverfasser:	Chen, Yining, Wang, Ke, Song, Guanghua, Jiang, Xiaohong
Format:	Artikel
Sprache:	eng
Schlagworte:	Continuous action space Control design Control methods Coordination Deep reinforcement learning deep reinforcement learning (DRL) Entropy Feature extraction Maximum entropy maximum entropy learning Multi-agent systems multiagent Multiagent systems Policies Task complexity Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In most existing studies on large-scale multiagent coordination, the control methods aim to learn discrete policies for agents with finite choices. They rarely consider selecting actions directly from continuous action spaces to provide more accurate control; therefore, they are normally unsuitable for more complex tasks. To solve the control issue of large-scale multiagent systems with continuous action spaces, we propose a novel multiagent reinforcement learning (MARL) approach named entropy-enhanced hierarchical graph continuous action multiagent coordination control method (EHCAMA) to derive stable continuous policies, by constructing a new network architecture in an actor-critic framework. By optimizing policies with maximum entropy learning, agents improve their exploration ability in training and acquire an excellent performance in execution. Further, we employ hierarchical graph attention networks (HGATs) and gated recurrent units (GRUs) to improve the scalability and transferability of our method. We simulate the performance of EHCAMA for cooperative tasks with both homogeneous and heterogeneous agents, and compare it with soft actor-critic-hierarchical graph recurrent network (SAC-HGRN), hierarchical graph attention-based multiagent actor-critic (HAMA), actor hierarchical attention critic (AHAC), and adaptive and gated graph attention network (AGGAT)-Comm. The experimental results show that our method consistently outperforms the baselines in large-scale multiagent scenarios.
ISSN:	2379-8920 2379-8939
DOI:	10.1109/TCDS.2023.3339131