Research on a Personalized Decision Control Algorithm for Autonomous Vehicles Based on the Reinforcement Learning from Human Feedback Strategy

To address the shortcomings of previous autonomous decision models, which often overlook the personalized features of users, this paper proposes a personalized decision control algorithm for autonomous vehicles based on RLHF (reinforcement learning from human feedback). The algorithm combines two re...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Electronics (Basel) 2024-06, Vol.13 (11), p.2054
Hauptverfasser:	Li, Ning, Chen, Pengzhan
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Autonomous vehicles Behavior Control algorithms Control theory Customization Decision making Design Driverless cars Feedback Machine learning Neural networks Optimization Parameters Traffic Trajectory analysis
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue	11
container_start_page	2054
container_title	Electronics (Basel)
container_volume	13
creator	Li, Ning Chen, Pengzhan
description	To address the shortcomings of previous autonomous decision models, which often overlook the personalized features of users, this paper proposes a personalized decision control algorithm for autonomous vehicles based on RLHF (reinforcement learning from human feedback). The algorithm combines two reinforcement learning approaches, DDPG (Deep Deterministic Policy Gradient) and PPO (proximal policy optimization), and divides the control scheme into three phases including pre-training, human evaluation, and parameter optimization. During the pre-training phase, an agent is trained using the DDPG algorithm. In the human evaluation phase, different trajectories generated by the DDPG-trained agent are scored by individuals with different styles, and the respective reward models are trained based on the trajectories. In the parameter optimization phase, the network parameters are updated using the PPO algorithm and the reward values given by the reward model to achieve personalized autonomous vehicle control. To validate the control algorithm designed in this paper, a simulation scenario was built using CARLA_0.9.13 software. The results demonstrate that the proposed algorithm can provide personalized decision control solutions for different styles of people, satisfying human needs while ensuring safety.
doi_str_mv	10.3390/electronics13112054
format	Article
fullrecord	<record><control><sourceid>gale_proqu</sourceid><recordid>TN_cdi_proquest_journals_3067422554</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><galeid>A797898428</galeid><sourcerecordid>A797898428</sourcerecordid><originalsourceid>FETCH-LOGICAL-c241t-ca55bde4ceb7f9acfbd6e28466cb5d0774a87ac350b142bbf332fdff38e52e9d3</originalsourceid><addsrcrecordid>eNptUctOwzAQjBBIVKVfwMUS5xbHdpr4GMqjSJVA5XGNHGfduCR2sZ1D-Qi-GVflwIHdw65GM_vQJMllimeUcnwNHcjgrNHSpzRNCc7YSTIiOOdTTjg5_dOfJxPvtzgGT2lB8Sj5XoMH4WSLrEECPYPz1ohOf0GDbkFqryO-sCYu6FDZbazToe2Rsg6VQ7DG9nbw6B1aLTvw6Eb4KIyS0AJagzaRKKEHE9AqrjHabJBytkfLoRcG3QM0tZAf6CU4EWCzv0jOlOg8TH7rOHm7v3tdLKerp4fHRbmaSsLSMJUiy-oGmIQ6V1xIVTdzIAWbz2WdNTjPmShyIWmG65SRulaUEtUoRQvICPCGjpOr49yds58D-FBt7eDi476ieJ4zQrKMRdbsyNqIDqrDM_FMGbOBXktrQOmIlznPC14wUkQBPQqks947UNXO6V64fZXi6mBW9Y9Z9AcT-Y4h</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3067422554</pqid></control><display><type>article</type><title>Research on a Personalized Decision Control Algorithm for Autonomous Vehicles Based on the Reinforcement Learning from Human Feedback Strategy</title><source>MDPI - Multidisciplinary Digital Publishing Institute</source><source>EZB-FREE-00999 freely available EZB journals</source><creator>Li, Ning ; Chen, Pengzhan</creator><creatorcontrib>Li, Ning ; Chen, Pengzhan</creatorcontrib><description>To address the shortcomings of previous autonomous decision models, which often overlook the personalized features of users, this paper proposes a personalized decision control algorithm for autonomous vehicles based on RLHF (reinforcement learning from human feedback). The algorithm combines two reinforcement learning approaches, DDPG (Deep Deterministic Policy Gradient) and PPO (proximal policy optimization), and divides the control scheme into three phases including pre-training, human evaluation, and parameter optimization. During the pre-training phase, an agent is trained using the DDPG algorithm. In the human evaluation phase, different trajectories generated by the DDPG-trained agent are scored by individuals with different styles, and the respective reward models are trained based on the trajectories. In the parameter optimization phase, the network parameters are updated using the PPO algorithm and the reward values given by the reward model to achieve personalized autonomous vehicle control. To validate the control algorithm designed in this paper, a simulation scenario was built using CARLA_0.9.13 software. The results demonstrate that the proposed algorithm can provide personalized decision control solutions for different styles of people, satisfying human needs while ensuring safety.</description><identifier>ISSN: 2079-9292</identifier><identifier>EISSN: 2079-9292</identifier><identifier>DOI: 10.3390/electronics13112054</identifier><language>eng</language><publisher>Basel: MDPI AG</publisher><subject>Algorithms ; Autonomous vehicles ; Behavior ; Control algorithms ; Control theory ; Customization ; Decision making ; Design ; Driverless cars ; Feedback ; Machine learning ; Neural networks ; Optimization ; Parameters ; Traffic ; Trajectory analysis</subject><ispartof>Electronics (Basel), 2024-06, Vol.13 (11), p.2054</ispartof><rights>COPYRIGHT 2024 MDPI AG</rights><rights>2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c241t-ca55bde4ceb7f9acfbd6e28466cb5d0774a87ac350b142bbf332fdff38e52e9d3</cites><orcidid>0000-0001-7188-362X</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>314,780,784,27915,27916</link.rule.ids></links><search><creatorcontrib>Li, Ning</creatorcontrib><creatorcontrib>Chen, Pengzhan</creatorcontrib><title>Research on a Personalized Decision Control Algorithm for Autonomous Vehicles Based on the Reinforcement Learning from Human Feedback Strategy</title><title>Electronics (Basel)</title><description>To address the shortcomings of previous autonomous decision models, which often overlook the personalized features of users, this paper proposes a personalized decision control algorithm for autonomous vehicles based on RLHF (reinforcement learning from human feedback). The algorithm combines two reinforcement learning approaches, DDPG (Deep Deterministic Policy Gradient) and PPO (proximal policy optimization), and divides the control scheme into three phases including pre-training, human evaluation, and parameter optimization. During the pre-training phase, an agent is trained using the DDPG algorithm. In the human evaluation phase, different trajectories generated by the DDPG-trained agent are scored by individuals with different styles, and the respective reward models are trained based on the trajectories. In the parameter optimization phase, the network parameters are updated using the PPO algorithm and the reward values given by the reward model to achieve personalized autonomous vehicle control. To validate the control algorithm designed in this paper, a simulation scenario was built using CARLA_0.9.13 software. The results demonstrate that the proposed algorithm can provide personalized decision control solutions for different styles of people, satisfying human needs while ensuring safety.</description><subject>Algorithms</subject><subject>Autonomous vehicles</subject><subject>Behavior</subject><subject>Control algorithms</subject><subject>Control theory</subject><subject>Customization</subject><subject>Decision making</subject><subject>Design</subject><subject>Driverless cars</subject><subject>Feedback</subject><subject>Machine learning</subject><subject>Neural networks</subject><subject>Optimization</subject><subject>Parameters</subject><subject>Traffic</subject><subject>Trajectory analysis</subject><issn>2079-9292</issn><issn>2079-9292</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNptUctOwzAQjBBIVKVfwMUS5xbHdpr4GMqjSJVA5XGNHGfduCR2sZ1D-Qi-GVflwIHdw65GM_vQJMllimeUcnwNHcjgrNHSpzRNCc7YSTIiOOdTTjg5_dOfJxPvtzgGT2lB8Sj5XoMH4WSLrEECPYPz1ohOf0GDbkFqryO-sCYu6FDZbazToe2Rsg6VQ7DG9nbw6B1aLTvw6Eb4KIyS0AJagzaRKKEHE9AqrjHabJBytkfLoRcG3QM0tZAf6CU4EWCzv0jOlOg8TH7rOHm7v3tdLKerp4fHRbmaSsLSMJUiy-oGmIQ6V1xIVTdzIAWbz2WdNTjPmShyIWmG65SRulaUEtUoRQvICPCGjpOr49yds58D-FBt7eDi476ieJ4zQrKMRdbsyNqIDqrDM_FMGbOBXktrQOmIlznPC14wUkQBPQqks947UNXO6V64fZXi6mBW9Y9Z9AcT-Y4h</recordid><startdate>20240601</startdate><enddate>20240601</enddate><creator>Li, Ning</creator><creator>Chen, Pengzhan</creator><general>MDPI AG</general><scope>AAYXX</scope><scope>CITATION</scope><scope>7SP</scope><scope>8FD</scope><scope>8FE</scope><scope>8FG</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>ARAPS</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L7M</scope><scope>P5Z</scope><scope>P62</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><orcidid>https://orcid.org/0000-0001-7188-362X</orcidid></search><sort><creationdate>20240601</creationdate><title>Research on a Personalized Decision Control Algorithm for Autonomous Vehicles Based on the Reinforcement Learning from Human Feedback Strategy</title><author>Li, Ning ; Chen, Pengzhan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c241t-ca55bde4ceb7f9acfbd6e28466cb5d0774a87ac350b142bbf332fdff38e52e9d3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Algorithms</topic><topic>Autonomous vehicles</topic><topic>Behavior</topic><topic>Control algorithms</topic><topic>Control theory</topic><topic>Customization</topic><topic>Decision making</topic><topic>Design</topic><topic>Driverless cars</topic><topic>Feedback</topic><topic>Machine learning</topic><topic>Neural networks</topic><topic>Optimization</topic><topic>Parameters</topic><topic>Traffic</topic><topic>Trajectory analysis</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Li, Ning</creatorcontrib><creatorcontrib>Chen, Pengzhan</creatorcontrib><collection>CrossRef</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>Advanced Technologies & Aerospace Collection</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection (ProQuest)</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Advanced Technologies & Aerospace Database</collection><collection>ProQuest Advanced Technologies & Aerospace Collection</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><jtitle>Electronics (Basel)</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Li, Ning</au><au>Chen, Pengzhan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Research on a Personalized Decision Control Algorithm for Autonomous Vehicles Based on the Reinforcement Learning from Human Feedback Strategy</atitle><jtitle>Electronics (Basel)</jtitle><date>2024-06-01</date><risdate>2024</risdate><volume>13</volume><issue>11</issue><spage>2054</spage><pages>2054-</pages><issn>2079-9292</issn><eissn>2079-9292</eissn><abstract>To address the shortcomings of previous autonomous decision models, which often overlook the personalized features of users, this paper proposes a personalized decision control algorithm for autonomous vehicles based on RLHF (reinforcement learning from human feedback). The algorithm combines two reinforcement learning approaches, DDPG (Deep Deterministic Policy Gradient) and PPO (proximal policy optimization), and divides the control scheme into three phases including pre-training, human evaluation, and parameter optimization. During the pre-training phase, an agent is trained using the DDPG algorithm. In the human evaluation phase, different trajectories generated by the DDPG-trained agent are scored by individuals with different styles, and the respective reward models are trained based on the trajectories. In the parameter optimization phase, the network parameters are updated using the PPO algorithm and the reward values given by the reward model to achieve personalized autonomous vehicle control. To validate the control algorithm designed in this paper, a simulation scenario was built using CARLA_0.9.13 software. The results demonstrate that the proposed algorithm can provide personalized decision control solutions for different styles of people, satisfying human needs while ensuring safety.</abstract><cop>Basel</cop><pub>MDPI AG</pub><doi>10.3390/electronics13112054</doi><orcidid>https://orcid.org/0000-0001-7188-362X</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 2079-9292
ispartof	Electronics (Basel), 2024-06, Vol.13 (11), p.2054
issn	2079-9292 2079-9292
language	eng
recordid	cdi_proquest_journals_3067422554
source	MDPI - Multidisciplinary Digital Publishing Institute; EZB-FREE-00999 freely available EZB journals
subjects	Algorithms Autonomous vehicles Behavior Control algorithms Control theory Customization Decision making Design Driverless cars Feedback Machine learning Neural networks Optimization Parameters Traffic Trajectory analysis
title	Research on a Personalized Decision Control Algorithm for Autonomous Vehicles Based on the Reinforcement Learning from Human Feedback Strategy
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-14T23%3A33%3A12IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-gale_proqu&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Research%20on%20a%20Personalized%20Decision%20Control%20Algorithm%20for%20Autonomous%20Vehicles%20Based%20on%20the%20Reinforcement%20Learning%20from%20Human%20Feedback%20Strategy&rft.jtitle=Electronics%20(Basel)&rft.au=Li,%20Ning&rft.date=2024-06-01&rft.volume=13&rft.issue=11&rft.spage=2054&rft.pages=2054-&rft.issn=2079-9292&rft.eissn=2079-9292&rft_id=info:doi/10.3390/electronics13112054&rft_dat=%3Cgale_proqu%3EA797898428%3C/gale_proqu%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3067422554&rft_id=info:pmid/&rft_galeid=A797898428&rfr_iscdi=true