Intersection decision making for autonomous vehicles based on improved PPO algorithm
The deployment of autonomous vehicles (AVs) in complex urban environments faces numerous challenges, especially at intersections where they coexist with human‐driven vehicles (HVs), resulting in increased safety risks. In response, this study proposes an improved control strategy based on the Proxim...
Gespeichert in:
Veröffentlicht in: | IET intelligent transport systems 2024-12, Vol.18 (S1), p.2921-2938 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The deployment of autonomous vehicles (AVs) in complex urban environments faces numerous challenges, especially at intersections where they coexist with human‐driven vehicles (HVs), resulting in increased safety risks. In response, this study proposes an improved control strategy based on the Proximal Policy Optimization (PPO) algorithm, specifically designed for hybrid intersections, known as MSA‐PPO. First, the Self‐Attention Mechanism (SAM) is introduced into the algorithmic framework to quickly identify the surrounding vehicles with a greater impact on the ego vehicle from different perspectives, accelerating data processing and improving decision quality. Second, an invalid action masking mechanism is adopted to reduce the action space, ensuring actions are only selected from feasible sets, thereby enhancing decision efficiency. Finally, comparative and ablation experiments in hybrid intersection simulation environments of varying complexity are conducted to validate the algorithm's effectiveness. The results show that the improved algorithm converges faster, achieves higher decision accuracy, and demonstrates the highest speed levels during driving compared to other baseline algorithms.
In this article, we improve the proximal policy optimisation (PPO) algorithm in deep reinforcement learning and propose an MSA‐PPO algorithm, which adopts the self‐attention mechanism for input data processing, effectively identifies and focuses on the most important information in the interaction between vehicles to improve the overall performance of the system, in addition to applying the ineffective action masking mechanism to select the effective actions under specific conditions and narrow the decision space, greatly improve the learning efficiency, and thus improve the overall performance of the system. |
---|---|
ISSN: | 1751-956X 1751-9578 |
DOI: | 10.1049/itr2.12593 |