Reward-Reinforced Generative Adversarial Networks for Multi-Agent Systems

Multi-agent systems deliver highly resilient and adaptable solutions for common problems in telecommunications, aerospace, and industrial robotics. However, achieving an optimal global goal remains a persistent obstacle for collaborative multi-agent systems, where learning affects the behaviour of m...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on emerging topics in computational intelligence 2022-06, Vol.6 (3), p.479-488
Hauptverfasser:	Zheng, Changgang, Yang, Shufan, Parra-Ullauri, Juan Marcelo, Garcia-Dominguez, Antonio, Bencomo, Nelly
Format:	Artikel
Sprache:	eng
Schlagworte:	Aerospace industry airborne base station (ABS) Approximation Base stations Cost function GAN Generative adversarial networks Generators Industrial robots Machine learning Mathematical analysis Mathematical model multi-agent Multi-agent systems Multiagent systems Optimization Radio equipment Reinforcement learning reward-reinforced GAN Robotics Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Multi-agent systems deliver highly resilient and adaptable solutions for common problems in telecommunications, aerospace, and industrial robotics. However, achieving an optimal global goal remains a persistent obstacle for collaborative multi-agent systems, where learning affects the behaviour of more than one agent. A number of nonlinear function approximation methods have been proposed for solving the Bellman equation, which describe a recursive format of an optimal policy. However, how to leverage the value distribution based on reinforcement learning, and how to improve the efficiency and efficacy of such systems remain a challenge. In this work, we developed a reward-reinforced generative adversarial network to represent the distribution of the value function, replacing the approximation of Bellman updates. We demonstrated our method is resilient and outperforms other conventional reinforcement learning methods. This method is also applied to a practical case study: maximising the number of user connections to autonomous airborne base stations in a mobile communication network. Our method maximises the data likelihood using a cost function under which agents have optimal learned behaviours. This reward-reinforced generative adversarial network can be used as a generic framework for multi-agent learning at the system level.
ISSN:	2471-285X 2471-285X
DOI:	10.1109/TETCI.2021.3082204