Safe Reinforcement Learning for Autonomous Vehicle Using Monte Carlo Tree Search

Reinforcement learning has gradually demonstrated its decision-making ability in autonomous driving. Reinforcement learning is learning how to map states to actions by interacting with environment so as to maximize the long-term reward. Within limited interactions, the learner will get a suitable dr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on intelligent transportation systems 2022-07, Vol.23 (7), p.6766-6773
Hauptverfasser:	Mo, Shuojie, Pei, Xiaofei, Wu, Chaoxian
Format:	Artikel
Sprache:	eng
Schlagworte:	autonomous vehicle Autonomous vehicles Decision making Long short term memory Machine learning Modules Monte Carlo methods Monte Carlo simulation Monte Carlo tree search Reinforcement learning Risk Safety Search algorithms Search problems State estimation
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Reinforcement learning has gradually demonstrated its decision-making ability in autonomous driving. Reinforcement learning is learning how to map states to actions by interacting with environment so as to maximize the long-term reward. Within limited interactions, the learner will get a suitable driving policy according to the designed reward function. However there will be a lot of unsafe behaviors during training in traditional reinforcement learning. This paper proposes a RL-based method combined with RL agent and Monte Carlo tree search algorithm to reduce unsafe behaviors. The proposed safe reinforcement learning framework mainly consists of two modules: risk state estimation module and safe policy search module. Once the future state will be risky calculated by the risk state estimation module using current state information and the action outputted by the RL agent, the MCTS based safe policy search module will activate to guarantee a safer exploration by adding an additional reward for risk actions. We test the approach in several random overtake scenarios, resulting in faster convergence and safer behaviors compared to traditional reinforcement learning.
ISSN:	1524-9050 1558-0016
DOI:	10.1109/TITS.2021.3061627