Optimizing Traffic Flow With Reinforcement Learning: A Study on Traffic Light Management

The non-adaptive management of traffic lights has proven inefficient for a number of drawbacks. They mainly impinge on CO2 emissions, fuel consumption, traffic waiting time, and heavy traffic. In this study, we propose a traffic signal control system that combines the accuracy of mathematical modeli...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on intelligent transportation systems 2024-07, Vol.25 (7), p.7467-7476
Hauptverfasser:	Merbah, Amal, Ben-Othman, Jalel
Format:	Artikel
Sprache:	eng
Schlagworte:	Configurations Deep learning Delays Energy consumption Management systems Markov processes Mathematical analysis Mathematical models mobility management Optimization policy iteration Queuing theory Real time Real-time systems Switches Switching Traffic control Traffic flow Traffic intersections traffic light management Traffic signals Urban areas Vehicle dynamics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The non-adaptive management of traffic lights has proven inefficient for a number of drawbacks. They mainly impinge on CO2 emissions, fuel consumption, traffic waiting time, and heavy traffic. In this study, we propose a traffic signal control system that combines the accuracy of mathematical modeling with the real-time and adaptation features of deep learning (DL) by basing the DL configuration on a mathematical model of the interaction between the environment and the intersection as a Markov decision process (MDP) while taking structural and safety issues into consideration. As a resolution method, we suggest in this study a policy iteration (PI) method, which gives the best policy to follow so as to choose the action that determines the phase duration. These phases minimize the reward, which is the average waiting time (AWT) for all vehicles crossing the intersection. The PI has demonstrated greater efficiency compared to management systems based on fixed durations in various traffic situations. Instead of triggering the PI system for each new situation encountered and minimizing the processing time, the PI will act as a learning method for the DL program. We build a learning database by storing several situations represented by the variables: input flow, latest switching dates, output flows, traffic light states, and queue lengths, with their respective solutions returned by PI as the policy for selecting next switching dates. Due to this configuration, DL has been able to respond optimally and in real-time to different levels of throughput: low, medium, and high.
ISSN:	1524-9050 1558-0016
DOI:	10.1109/TITS.2024.3351471