A meta–reinforcement learning algorithm for traffic signal control to automatically switch different reward functions according to the saturation level of traffic flows

Reinforcement learning (RL) algorithms have been widely applied in solving traffic signal control problems. Traffic environments, however, are intrinsically nonstationary, which creates a convergence problem that RL algorithms struggle to overcome. Basically, as a target problem for an RL algorithm,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computer-aided civil and infrastructure engineering 2023-04, Vol.38 (6), p.779-798
Hauptverfasser:	Kim, Gyeongjun, Kang, Jiwon, Sohn, Keemin
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Classification Machine learning Markov processes Traffic congestion Traffic control Traffic flow Traffic signals
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Reinforcement learning (RL) algorithms have been widely applied in solving traffic signal control problems. Traffic environments, however, are intrinsically nonstationary, which creates a convergence problem that RL algorithms struggle to overcome. Basically, as a target problem for an RL algorithm, the Markov decision process (MDP) can be solved only when both the transition and reward functions do not vary. Unfortunately, the environment for traffic signal control is not stationary since the goal of traffic signal control varies according to congestion levels. For unsaturated traffic conditions, the objective of traffic signal control should be to minimize vehicle delay. On the other hand, the objective must be to maximize the throughput when traffic flow is saturated. A multiregime analysis is possible for varying conditions, but classifying the traffic regime creates another complex task. The present study provides a meta‐RL algorithm that embeds a latent vector to recognize the different contexts of an environment in order to automatically classify traffic regimes and apply a customized reward for each context. In simulation experiments, the proposed meta‐RL algorithm succeeded in differentiating rewards according to the saturation level of traffic conditions.
ISSN:	1093-9687 1467-8667
DOI:	10.1111/mice.12924