Multiagent Meta-Reinforcement Learning for Adaptive Multipath Routing Optimization
In this article, we investigate the routing problem of packet networks through multiagent reinforcement learning (RL), which is a very challenging topic in distributed and autonomous networked systems. In specific, the routing problem is modeled as a networked multiagent partially observable Markov...
Gespeichert in:
Veröffentlicht in: | IEEE transaction on neural networks and learning systems 2022-10, Vol.33 (10), p.5374-5386 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In this article, we investigate the routing problem of packet networks through multiagent reinforcement learning (RL), which is a very challenging topic in distributed and autonomous networked systems. In specific, the routing problem is modeled as a networked multiagent partially observable Markov decision process (MDP). Since the MDP of a network node is not only affected by its neighboring nodes' policies but also the network traffic demand, it becomes a multitask learning problem. Inspired by recent success of RL and metalearning, we propose two novel model-free multiagent RL algorithms, named multiagent proximal policy optimization (MAPPO) and multiagent metaproximal policy optimization (meta-MAPPO), to optimize the network performances under fixed and time-varying traffic demand, respectively. A practicable distributed implementation framework is designed based on the separability of exploration and exploitation in training MAPPO. Compared with the existing routing optimization policies, our simulation results demonstrate the excellent performances of the proposed algorithms. |
---|---|
ISSN: | 2162-237X 2162-2388 |
DOI: | 10.1109/TNNLS.2021.3070584 |