Reference RL: Reinforcement learning with reference mechanism and its application in traffic signal control

This paper addresses the challenges of deploying reinforcement learning (RL) models for traffic signal control (TSC) in real-world environments. Real-world training can prevent mismatches between simulation environments and the actual traffic conditions, thereby achieving better performance of agent...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Information sciences 2025-01, Vol.689, p.121485, Article 121485
Hauptverfasser:	Lu, Yunxue, Hegyi, Andreas, Maria Salomons, A., Wang, Hao
Format:	Artikel
Sprache:	eng
Schlagworte:	Real-world learning Reference mechanism Reference policy Reinforcement learning Traffic signal control
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper addresses the challenges of deploying reinforcement learning (RL) models for traffic signal control (TSC) in real-world environments. Real-world training can prevent mismatches between simulation environments and the actual traffic conditions, thereby achieving better performance of agent upon deployment. However, free explorations by agents during real-world training can disrupt traffic operations. To mitigate this, this paper proposes a reference mechanism to guide the decision-making process within the RL framework. A reference timing policy, typically a model-based signal strategy, is integrated into the learning process to provide agents with reference actions. Specifically, an additional Q-value function is introduced to evaluate both the agent’s actions and those of the reference policy, allowing for adjustments before the actions are executed in real traffic system. Numerical results indicate that the reference mechanism effectively enhances system performance early in the training process, thus accelerating learning. We also combine the reference RL method with a pretraining procedure and a jump-start algorithm, respectively. Experimental results demonstrate their effectiveness in further enhancing system performance and facilitating real-world training.
ISSN:	0020-0255
DOI:	10.1016/j.ins.2024.121485