Multiagent Meta-Reinforcement Learning for Adaptive Multipath Routing Optimization

In this article, we investigate the routing problem of packet networks through multiagent reinforcement learning (RL), which is a very challenging topic in distributed and autonomous networked systems. In specific, the routing problem is modeled as a networked multiagent partially observable Markov...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2022-10, Vol.33 (10), p.5374-5386
Hauptverfasser:	Chen, Long, Hu, Bin, Guan, Zhi-Hong, Zhao, Lian, Shen, Xuemin
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptive learning Adaptive routing Algorithms Communications traffic Exploitation Heuristic algorithms Learning Markov processes metapolicy gradient multiagent Multiagent systems Optimization Policies Reinforcement reinforcement learning (RL) Routing Routing protocols Spread spectrum communication Task analysis Traffic Training
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this article, we investigate the routing problem of packet networks through multiagent reinforcement learning (RL), which is a very challenging topic in distributed and autonomous networked systems. In specific, the routing problem is modeled as a networked multiagent partially observable Markov decision process (MDP). Since the MDP of a network node is not only affected by its neighboring nodes' policies but also the network traffic demand, it becomes a multitask learning problem. Inspired by recent success of RL and metalearning, we propose two novel model-free multiagent RL algorithms, named multiagent proximal policy optimization (MAPPO) and multiagent metaproximal policy optimization (meta-MAPPO), to optimize the network performances under fixed and time-varying traffic demand, respectively. A practicable distributed implementation framework is designed based on the separability of exploration and exploitation in training MAPPO. Compared with the existing routing optimization policies, our simulation results demonstrate the excellent performances of the proposed algorithms.
ISSN:	2162-237X 2162-2388
DOI:	10.1109/TNNLS.2021.3070584