Constraints Driven Safe Reinforcement Learning for Autonomous Driving Decision-Making

Although reinforcement learning (RL) methodologies exhibit potential in addressing decision-making and planning problems in autonomous driving, ensuring the safety of the vehicle under all circumstances remains a formidable challenge in practical applications. Current RL methods are predominantly dr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE access 2024, Vol.12, p.128007-128023
Hauptverfasser:	Gao, Fei, Wang, Xiaodong, Fan, Yuze, Gao, Zhenhai, Zhao, Rui
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Autonomous driving Autonomous vehicles Collision rates constrained policy optimization Constraints Cost function Decision making Lane changing Markov processes Measurement Optimization Planning Reinforcement learning Road transportation Safety
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	128023
container_issue
container_start_page	128007
container_title	IEEE access
container_volume	12
creator	Gao, Fei Wang, Xiaodong Fan, Yuze Gao, Zhenhai Zhao, Rui
description	Although reinforcement learning (RL) methodologies exhibit potential in addressing decision-making and planning problems in autonomous driving, ensuring the safety of the vehicle under all circumstances remains a formidable challenge in practical applications. Current RL methods are predominantly driven by singular reward mechanisms, frequently encountering difficulties in balancing multiple sub-rewards such as safety, comfort, and efficiency. To address these limitations, this paper introduces a constraint-driven safety RL method, applied to decision-making and planning policy in highway scenarios. This method ensures decisions maximize performance rewards within the bounds of safety constraints, exhibiting exceptional robustness. Initially, the framework reformulates the autonomous driving decision-making problem as a Constrained Markov Decision Process (CMDP) within the safety RL framework. It then introduces a Multi-Level Safety-Constrained Policy Optimization (MLSCPO) method, incorporating a cost function to address safety constraints. Ultimately, simulated tests conducted within the CARLA environment demonstrate that the proposed method MLSCPO outperforms the current advanced safe reinforcement learning policy, Proximal Policy Optimization with Lagrangian (PPO-Lag) and the traditional stable longitudinal and lateral autonomous driving model, Intelligent Driver Model with Minimization of Overall Braking Induced by Lane Changes (IDM+MOBIL). Compared to the classic IDM+MOBIL method, the proposed approach not only achieves efficient driving but also offers a better driving experience. In comparison with the reinforcement learning method PPO-Lag, it significantly enhances safety while ensuring driving efficiency, achieving a zero-collision rate. In the future, we will integrate the aforementioned potential expansion plans to enhance the usability and generalization capabilities of the method in real-world applications.
doi_str_mv	10.1109/ACCESS.2024.3454249
format	Article
fullrecord	<record><control><sourceid>proquest_cross</sourceid><recordid>TN_cdi_proquest_journals_3106514139</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10664018</ieee_id><doaj_id>oai_doaj_org_article_aff75aee17c5417db7d942a49b49d4db</doaj_id><sourcerecordid>3106514139</sourcerecordid><originalsourceid>FETCH-LOGICAL-c289t-6cdf852b3901f42ba612aa4fc28f3f481821f79dda1da099ee5ddff5810491673</originalsourceid><addsrcrecordid>eNpNUU1r3DAQNSWFhiS_oD0YevZWow_bOi7ORwMbCt3mLMbWKGiblVLJW8i_j7ZeSuYyM2_mvRl4VfUZ2AqA6W_rYbjZbleccbkSUkku9YfqnEOrG6FEe_au_lRd5bxjJfoCqe68ehxiyHNCH-ZcXyf_l0K9RUf1T_LBxTTRnsJcbwhT8OGpLlC9PswxxH08LIwjfE2Tzz6G5gF_l_6y-ujwOdPVKV9Uj7c3v4bvzebH3f2w3jQT7_XctJN1veKj0Ayc5CO2wBGlK1MnnOyh5-A6bS2CRaY1kbLWOdUDkxraTlxU94uujbgzL8nvMb2aiN78A2J6MphmPz2TQec6hUTQTUpCZ8fOaslR6lFqK-1YtL4uWi8p_jlQns0uHlIo7xsBrFUgQeiyJZatKcWcE7n_V4GZox1mscMc7TAnOwrry8LyRPSO0baSQS_eAACvhwQ</addsrcrecordid><sourcetype>Open Website</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3106514139</pqid></control><display><type>article</type><title>Constraints Driven Safe Reinforcement Learning for Autonomous Driving Decision-Making</title><source>IEEE Open Access Journals</source><source>DOAJ Directory of Open Access Journals</source><source>Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals</source><creator>Gao, Fei ; Wang, Xiaodong ; Fan, Yuze ; Gao, Zhenhai ; Zhao, Rui</creator><creatorcontrib>Gao, Fei ; Wang, Xiaodong ; Fan, Yuze ; Gao, Zhenhai ; Zhao, Rui</creatorcontrib><description>Although reinforcement learning (RL) methodologies exhibit potential in addressing decision-making and planning problems in autonomous driving, ensuring the safety of the vehicle under all circumstances remains a formidable challenge in practical applications. Current RL methods are predominantly driven by singular reward mechanisms, frequently encountering difficulties in balancing multiple sub-rewards such as safety, comfort, and efficiency. To address these limitations, this paper introduces a constraint-driven safety RL method, applied to decision-making and planning policy in highway scenarios. This method ensures decisions maximize performance rewards within the bounds of safety constraints, exhibiting exceptional robustness. Initially, the framework reformulates the autonomous driving decision-making problem as a Constrained Markov Decision Process (CMDP) within the safety RL framework. It then introduces a Multi-Level Safety-Constrained Policy Optimization (MLSCPO) method, incorporating a cost function to address safety constraints. Ultimately, simulated tests conducted within the CARLA environment demonstrate that the proposed method MLSCPO outperforms the current advanced safe reinforcement learning policy, Proximal Policy Optimization with Lagrangian (PPO-Lag) and the traditional stable longitudinal and lateral autonomous driving model, Intelligent Driver Model with Minimization of Overall Braking Induced by Lane Changes (IDM+MOBIL). Compared to the classic IDM+MOBIL method, the proposed approach not only achieves efficient driving but also offers a better driving experience. In comparison with the reinforcement learning method PPO-Lag, it significantly enhances safety while ensuring driving efficiency, achieving a zero-collision rate. In the future, we will integrate the aforementioned potential expansion plans to enhance the usability and generalization capabilities of the method in real-world applications.</description><identifier>ISSN: 2169-3536</identifier><identifier>EISSN: 2169-3536</identifier><identifier>DOI: 10.1109/ACCESS.2024.3454249</identifier><identifier>CODEN: IAECCG</identifier><language>eng</language><publisher>Piscataway: IEEE</publisher><subject>Accuracy ; Autonomous driving ; Autonomous vehicles ; Collision rates ; constrained policy optimization ; Constraints ; Cost function ; Decision making ; Lane changing ; Markov processes ; Measurement ; Optimization ; Planning ; Reinforcement learning ; Road transportation ; Safety</subject><ispartof>IEEE access, 2024, Vol.12, p.128007-128023</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c289t-6cdf852b3901f42ba612aa4fc28f3f481821f79dda1da099ee5ddff5810491673</cites><orcidid>0009-0003-8309-7396 ; 0000-0002-4623-3956 ; 0000-0001-9020-6720 ; 0000-0003-1597-1961</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10664018$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>315,781,785,865,2103,4025,27635,27925,27926,27927,54935</link.rule.ids></links><search><creatorcontrib>Gao, Fei</creatorcontrib><creatorcontrib>Wang, Xiaodong</creatorcontrib><creatorcontrib>Fan, Yuze</creatorcontrib><creatorcontrib>Gao, Zhenhai</creatorcontrib><creatorcontrib>Zhao, Rui</creatorcontrib><title>Constraints Driven Safe Reinforcement Learning for Autonomous Driving Decision-Making</title><title>IEEE access</title><addtitle>Access</addtitle><description>Although reinforcement learning (RL) methodologies exhibit potential in addressing decision-making and planning problems in autonomous driving, ensuring the safety of the vehicle under all circumstances remains a formidable challenge in practical applications. Current RL methods are predominantly driven by singular reward mechanisms, frequently encountering difficulties in balancing multiple sub-rewards such as safety, comfort, and efficiency. To address these limitations, this paper introduces a constraint-driven safety RL method, applied to decision-making and planning policy in highway scenarios. This method ensures decisions maximize performance rewards within the bounds of safety constraints, exhibiting exceptional robustness. Initially, the framework reformulates the autonomous driving decision-making problem as a Constrained Markov Decision Process (CMDP) within the safety RL framework. It then introduces a Multi-Level Safety-Constrained Policy Optimization (MLSCPO) method, incorporating a cost function to address safety constraints. Ultimately, simulated tests conducted within the CARLA environment demonstrate that the proposed method MLSCPO outperforms the current advanced safe reinforcement learning policy, Proximal Policy Optimization with Lagrangian (PPO-Lag) and the traditional stable longitudinal and lateral autonomous driving model, Intelligent Driver Model with Minimization of Overall Braking Induced by Lane Changes (IDM+MOBIL). Compared to the classic IDM+MOBIL method, the proposed approach not only achieves efficient driving but also offers a better driving experience. In comparison with the reinforcement learning method PPO-Lag, it significantly enhances safety while ensuring driving efficiency, achieving a zero-collision rate. In the future, we will integrate the aforementioned potential expansion plans to enhance the usability and generalization capabilities of the method in real-world applications.</description><subject>Accuracy</subject><subject>Autonomous driving</subject><subject>Autonomous vehicles</subject><subject>Collision rates</subject><subject>constrained policy optimization</subject><subject>Constraints</subject><subject>Cost function</subject><subject>Decision making</subject><subject>Lane changing</subject><subject>Markov processes</subject><subject>Measurement</subject><subject>Optimization</subject><subject>Planning</subject><subject>Reinforcement learning</subject><subject>Road transportation</subject><subject>Safety</subject><issn>2169-3536</issn><issn>2169-3536</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><sourceid>DOA</sourceid><recordid>eNpNUU1r3DAQNSWFhiS_oD0YevZWow_bOi7ORwMbCt3mLMbWKGiblVLJW8i_j7ZeSuYyM2_mvRl4VfUZ2AqA6W_rYbjZbleccbkSUkku9YfqnEOrG6FEe_au_lRd5bxjJfoCqe68ehxiyHNCH-ZcXyf_l0K9RUf1T_LBxTTRnsJcbwhT8OGpLlC9PswxxH08LIwjfE2Tzz6G5gF_l_6y-ujwOdPVKV9Uj7c3v4bvzebH3f2w3jQT7_XctJN1veKj0Ayc5CO2wBGlK1MnnOyh5-A6bS2CRaY1kbLWOdUDkxraTlxU94uujbgzL8nvMb2aiN78A2J6MphmPz2TQec6hUTQTUpCZ8fOaslR6lFqK-1YtL4uWi8p_jlQns0uHlIo7xsBrFUgQeiyJZatKcWcE7n_V4GZox1mscMc7TAnOwrry8LyRPSO0baSQS_eAACvhwQ</recordid><startdate>2024</startdate><enddate>2024</enddate><creator>Gao, Fei</creator><creator>Wang, Xiaodong</creator><creator>Fan, Yuze</creator><creator>Gao, Zhenhai</creator><creator>Zhao, Rui</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7SR</scope><scope>8BQ</scope><scope>8FD</scope><scope>JG9</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>DOA</scope><orcidid>https://orcid.org/0009-0003-8309-7396</orcidid><orcidid>https://orcid.org/0000-0002-4623-3956</orcidid><orcidid>https://orcid.org/0000-0001-9020-6720</orcidid><orcidid>https://orcid.org/0000-0003-1597-1961</orcidid></search><sort><creationdate>2024</creationdate><title>Constraints Driven Safe Reinforcement Learning for Autonomous Driving Decision-Making</title><author>Gao, Fei ; Wang, Xiaodong ; Fan, Yuze ; Gao, Zhenhai ; Zhao, Rui</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c289t-6cdf852b3901f42ba612aa4fc28f3f481821f79dda1da099ee5ddff5810491673</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Accuracy</topic><topic>Autonomous driving</topic><topic>Autonomous vehicles</topic><topic>Collision rates</topic><topic>constrained policy optimization</topic><topic>Constraints</topic><topic>Cost function</topic><topic>Decision making</topic><topic>Lane changing</topic><topic>Markov processes</topic><topic>Measurement</topic><topic>Optimization</topic><topic>Planning</topic><topic>Reinforcement learning</topic><topic>Road transportation</topic><topic>Safety</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Gao, Fei</creatorcontrib><creatorcontrib>Wang, Xiaodong</creatorcontrib><creatorcontrib>Fan, Yuze</creatorcontrib><creatorcontrib>Gao, Zhenhai</creatorcontrib><creatorcontrib>Zhao, Rui</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Engineered Materials Abstracts</collection><collection>METADEX</collection><collection>Technology Research Database</collection><collection>Materials Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>DOAJ Directory of Open Access Journals</collection><jtitle>IEEE access</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Gao, Fei</au><au>Wang, Xiaodong</au><au>Fan, Yuze</au><au>Gao, Zhenhai</au><au>Zhao, Rui</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Constraints Driven Safe Reinforcement Learning for Autonomous Driving Decision-Making</atitle><jtitle>IEEE access</jtitle><stitle>Access</stitle><date>2024</date><risdate>2024</risdate><volume>12</volume><spage>128007</spage><epage>128023</epage><pages>128007-128023</pages><issn>2169-3536</issn><eissn>2169-3536</eissn><coden>IAECCG</coden><abstract>Although reinforcement learning (RL) methodologies exhibit potential in addressing decision-making and planning problems in autonomous driving, ensuring the safety of the vehicle under all circumstances remains a formidable challenge in practical applications. Current RL methods are predominantly driven by singular reward mechanisms, frequently encountering difficulties in balancing multiple sub-rewards such as safety, comfort, and efficiency. To address these limitations, this paper introduces a constraint-driven safety RL method, applied to decision-making and planning policy in highway scenarios. This method ensures decisions maximize performance rewards within the bounds of safety constraints, exhibiting exceptional robustness. Initially, the framework reformulates the autonomous driving decision-making problem as a Constrained Markov Decision Process (CMDP) within the safety RL framework. It then introduces a Multi-Level Safety-Constrained Policy Optimization (MLSCPO) method, incorporating a cost function to address safety constraints. Ultimately, simulated tests conducted within the CARLA environment demonstrate that the proposed method MLSCPO outperforms the current advanced safe reinforcement learning policy, Proximal Policy Optimization with Lagrangian (PPO-Lag) and the traditional stable longitudinal and lateral autonomous driving model, Intelligent Driver Model with Minimization of Overall Braking Induced by Lane Changes (IDM+MOBIL). Compared to the classic IDM+MOBIL method, the proposed approach not only achieves efficient driving but also offers a better driving experience. In comparison with the reinforcement learning method PPO-Lag, it significantly enhances safety while ensuring driving efficiency, achieving a zero-collision rate. In the future, we will integrate the aforementioned potential expansion plans to enhance the usability and generalization capabilities of the method in real-world applications.</abstract><cop>Piscataway</cop><pub>IEEE</pub><doi>10.1109/ACCESS.2024.3454249</doi><tpages>17</tpages><orcidid>https://orcid.org/0009-0003-8309-7396</orcidid><orcidid>https://orcid.org/0000-0002-4623-3956</orcidid><orcidid>https://orcid.org/0000-0001-9020-6720</orcidid><orcidid>https://orcid.org/0000-0003-1597-1961</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2169-3536
ispartof	IEEE access, 2024, Vol.12, p.128007-128023
issn	2169-3536 2169-3536
language	eng
recordid	cdi_proquest_journals_3106514139
source	IEEE Open Access Journals; DOAJ Directory of Open Access Journals; Elektronische Zeitschriftenbibliothek - Frei zugängliche E-Journals
subjects	Accuracy Autonomous driving Autonomous vehicles Collision rates constrained policy optimization Constraints Cost function Decision making Lane changing Markov processes Measurement Optimization Planning Reinforcement learning Road transportation Safety
title	Constraints Driven Safe Reinforcement Learning for Autonomous Driving Decision-Making
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T13%3A14%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Constraints%20Driven%20Safe%20Reinforcement%20Learning%20for%20Autonomous%20Driving%20Decision-Making&rft.jtitle=IEEE%20access&rft.au=Gao,%20Fei&rft.date=2024&rft.volume=12&rft.spage=128007&rft.epage=128023&rft.pages=128007-128023&rft.issn=2169-3536&rft.eissn=2169-3536&rft.coden=IAECCG&rft_id=info:doi/10.1109/ACCESS.2024.3454249&rft_dat=%3Cproquest_cross%3E3106514139%3C/proquest_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3106514139&rft_id=info:pmid/&rft_ieee_id=10664018&rft_doaj_id=oai_doaj_org_article_aff75aee17c5417db7d942a49b49d4db&rfr_iscdi=true