Centralized Cooperation for Connected Autonomous Vehicles at Intersections by Safe Deep Reinforcement Learning

Connected and automated vehicles (CAVs) have the potential to transform traffic management, especially at intersections. Traditional traffic signals might become obsolete with the implementation of autonomous intersection management (AIM) systems, which aim for efficient and safe vehicle flow. Curre...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on mobile computing 2024-12, Vol.23 (12), p.12830-12847
Hauptverfasser: Zhao, Rui, Li, Yun, Wang, Kui, Fan, Yuze, Gao, Fei, Gao, Zhenhai
Format: Magazinearticle
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Connected and automated vehicles (CAVs) have the potential to transform traffic management, especially at intersections. Traditional traffic signals might become obsolete with the implementation of autonomous intersection management (AIM) systems, which aim for efficient and safe vehicle flow. Current AIM methods often rely on optimization control algorithms, which are not computationally efficient. Some methods use reinforcement learning (RL) but compromise safety for rewards and simplify traffic scenarios by designating specific turn lanes. This paper introduces a novel approach, the risk situation-aware constrained policy optimization (RSCPO), to enhance RL training with safety assurance. It uses Kullback-Leibler (KL) divergence to form a trust region, identifying risk levels in policy updates that could lead to dangerous situations, and suggests safe policy update mechanisms. Furthermore, the paper presents a safety reinforced all-directional autonomous intersection management (SafeR-ADAIM) algorithm. This algorithm accounts for the complexity of unpredictable all-direction turn lanes and collaboratively ensures the safety, efficiency, and smooth operation of CAVs at intersections. In simulations, our method surpasses the model predictive control (MPC)-based method in computational and traffic efficiency by 67.81 and 1.46 times, respectively. Additionally, it significantly reduces the mean collision rate from at most 35.01% to 0% compared to non-safety aware RL methods.
ISSN:1536-1233
1558-0660
DOI:10.1109/TMC.2024.3417441