Entropy Enhanced Multiagent Coordination Based on Hierarchical Graph Learning for Continuous Action Space
In most existing studies on large-scale multiagent coordination, the control methods aim to learn discrete policies for agents with finite choices. They rarely consider selecting actions directly from continuous action spaces to provide more accurate control; therefore, they are normally unsuitable...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on cognitive and developmental systems 2024-06, Vol.16 (3), p.1161-1171 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In most existing studies on large-scale multiagent coordination, the control methods aim to learn discrete policies for agents with finite choices. They rarely consider selecting actions directly from continuous action spaces to provide more accurate control; therefore, they are normally unsuitable for more complex tasks. To solve the control issue of large-scale multiagent systems with continuous action spaces, we propose a novel multiagent reinforcement learning (MARL) approach named entropy-enhanced hierarchical graph continuous action multiagent coordination control method (EHCAMA) to derive stable continuous policies, by constructing a new network architecture in an actor-critic framework. By optimizing policies with maximum entropy learning, agents improve their exploration ability in training and acquire an excellent performance in execution. Further, we employ hierarchical graph attention networks (HGATs) and gated recurrent units (GRUs) to improve the scalability and transferability of our method. We simulate the performance of EHCAMA for cooperative tasks with both homogeneous and heterogeneous agents, and compare it with soft actor-critic-hierarchical graph recurrent network (SAC-HGRN), hierarchical graph attention-based multiagent actor-critic (HAMA), actor hierarchical attention critic (AHAC), and adaptive and gated graph attention network (AGGAT)-Comm. The experimental results show that our method consistently outperforms the baselines in large-scale multiagent scenarios. |
---|---|
ISSN: | 2379-8920 2379-8939 |
DOI: | 10.1109/TCDS.2023.3339131 |