Vehicle optimal control method based on deep reinforcement learning

The invention discloses a vehicle optimal control method based on deep reinforcement learning. The method comprises the following steps: step 1, establishing a strategy network and a mutually independent value network; 2, the vehicle is controlled to run, and samples are collected; 3, inputting the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	HUANG XIWEN, HUANG XIANGDANG, FEI HANSHENG, YANG QIULING
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE ORDIFFERENT FUNCTION CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES CONTROLLING COUNTING PERFORMING OPERATIONS PHYSICS REGULATING ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TOTHE CONTROL OF A PARTICULAR SUB-UNIT SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES TRANSPORTING VEHICLES IN GENERAL
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	HUANG XIWEN HUANG XIANGDANG FEI HANSHENG YANG QIULING
description	The invention discloses a vehicle optimal control method based on deep reinforcement learning. The method comprises the following steps: step 1, establishing a strategy network and a mutually independent value network; 2, the vehicle is controlled to run, and samples are collected; 3, inputting the data st and at into a value network to obtain two value scores, and calculating a prediction score by taking the smaller value; inputting the state st + 1 into the strategy network to obtain an action at + 1, respectively inputting the data st + 1 and at + 1 into two value scores in the two value networks, determining a TD error according to the value scores and a prediction score, and updating the value networks; 4, updating the strategy network after the value network is updated twice; and step 5, repeating the steps 2-4 to perform network parameter tuning until the policy network achieves an expected effect, and outputting the finally updated policy network. The stability can be ensured in the process of optimiz
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN118372851A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN118372851A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN118372851A3</originalsourceid><addsrcrecordid>eNqNyjEOAiEQBVAaC6PeYTyABW6M2xqisbIythuEvy7JMEOA-8fGA1i95q2Ne2FJgUFaesqeKaj0qkwZfdFIb98QSYUiUKgiyaw1IEM6MXyVJJ-tWc2eG3Y_N2Z_uz7d_YCiE1rxAYI-uYe143A-jid7Gf45X445MqU</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Vehicle optimal control method based on deep reinforcement learning</title><source>esp@cenet</source><creator>HUANG XIWEN ; HUANG XIANGDANG ; FEI HANSHENG ; YANG QIULING</creator><creatorcontrib>HUANG XIWEN ; HUANG XIANGDANG ; FEI HANSHENG ; YANG QIULING</creatorcontrib><description>The invention discloses a vehicle optimal control method based on deep reinforcement learning. The method comprises the following steps: step 1, establishing a strategy network and a mutually independent value network; 2, the vehicle is controlled to run, and samples are collected; 3, inputting the data st and at into a value network to obtain two value scores, and calculating a prediction score by taking the smaller value; inputting the state st + 1 into the strategy network to obtain an action at + 1, respectively inputting the data st + 1 and at + 1 into two value scores in the two value networks, determining a TD error according to the value scores and a prediction score, and updating the value networks; 4, updating the strategy network after the value network is updated twice; and step 5, repeating the steps 2-4 to perform network parameter tuning until the policy network achieves an expected effect, and outputting the finally updated policy network. The stability can be ensured in the process of optimiz</description><language>chi ; eng</language><subject>CALCULATING ; COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS ; COMPUTING ; CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE ORDIFFERENT FUNCTION ; CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES ; CONTROLLING ; COUNTING ; PERFORMING OPERATIONS ; PHYSICS ; REGULATING ; ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TOTHE CONTROL OF A PARTICULAR SUB-UNIT ; SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES ; TRANSPORTING ; VEHICLES IN GENERAL</subject><creationdate>2024</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240723&DB=EPODOC&CC=CN&NR=118372851A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25563,76318</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20240723&DB=EPODOC&CC=CN&NR=118372851A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>HUANG XIWEN</creatorcontrib><creatorcontrib>HUANG XIANGDANG</creatorcontrib><creatorcontrib>FEI HANSHENG</creatorcontrib><creatorcontrib>YANG QIULING</creatorcontrib><title>Vehicle optimal control method based on deep reinforcement learning</title><description>The invention discloses a vehicle optimal control method based on deep reinforcement learning. The method comprises the following steps: step 1, establishing a strategy network and a mutually independent value network; 2, the vehicle is controlled to run, and samples are collected; 3, inputting the data st and at into a value network to obtain two value scores, and calculating a prediction score by taking the smaller value; inputting the state st + 1 into the strategy network to obtain an action at + 1, respectively inputting the data st + 1 and at + 1 into two value scores in the two value networks, determining a TD error according to the value scores and a prediction score, and updating the value networks; 4, updating the strategy network after the value network is updated twice; and step 5, repeating the steps 2-4 to perform network parameter tuning until the policy network achieves an expected effect, and outputting the finally updated policy network. The stability can be ensured in the process of optimiz</description><subject>CALCULATING</subject><subject>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</subject><subject>COMPUTING</subject><subject>CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE ORDIFFERENT FUNCTION</subject><subject>CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES</subject><subject>CONTROLLING</subject><subject>COUNTING</subject><subject>PERFORMING OPERATIONS</subject><subject>PHYSICS</subject><subject>REGULATING</subject><subject>ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TOTHE CONTROL OF A PARTICULAR SUB-UNIT</subject><subject>SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES</subject><subject>TRANSPORTING</subject><subject>VEHICLES IN GENERAL</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2024</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNyjEOAiEQBVAaC6PeYTyABW6M2xqisbIythuEvy7JMEOA-8fGA1i95q2Ne2FJgUFaesqeKaj0qkwZfdFIb98QSYUiUKgiyaw1IEM6MXyVJJ-tWc2eG3Y_N2Z_uz7d_YCiE1rxAYI-uYe143A-jid7Gf45X445MqU</recordid><startdate>20240723</startdate><enddate>20240723</enddate><creator>HUANG XIWEN</creator><creator>HUANG XIANGDANG</creator><creator>FEI HANSHENG</creator><creator>YANG QIULING</creator><scope>EVB</scope></search><sort><creationdate>20240723</creationdate><title>Vehicle optimal control method based on deep reinforcement learning</title><author>HUANG XIWEN ; HUANG XIANGDANG ; FEI HANSHENG ; YANG QIULING</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN118372851A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2024</creationdate><topic>CALCULATING</topic><topic>COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS</topic><topic>COMPUTING</topic><topic>CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE ORDIFFERENT FUNCTION</topic><topic>CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES</topic><topic>CONTROLLING</topic><topic>COUNTING</topic><topic>PERFORMING OPERATIONS</topic><topic>PHYSICS</topic><topic>REGULATING</topic><topic>ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TOTHE CONTROL OF A PARTICULAR SUB-UNIT</topic><topic>SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES</topic><topic>TRANSPORTING</topic><topic>VEHICLES IN GENERAL</topic><toplevel>online_resources</toplevel><creatorcontrib>HUANG XIWEN</creatorcontrib><creatorcontrib>HUANG XIANGDANG</creatorcontrib><creatorcontrib>FEI HANSHENG</creatorcontrib><creatorcontrib>YANG QIULING</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>HUANG XIWEN</au><au>HUANG XIANGDANG</au><au>FEI HANSHENG</au><au>YANG QIULING</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Vehicle optimal control method based on deep reinforcement learning</title><date>2024-07-23</date><risdate>2024</risdate><abstract>The invention discloses a vehicle optimal control method based on deep reinforcement learning. The method comprises the following steps: step 1, establishing a strategy network and a mutually independent value network; 2, the vehicle is controlled to run, and samples are collected; 3, inputting the data st and at into a value network to obtain two value scores, and calculating a prediction score by taking the smaller value; inputting the state st + 1 into the strategy network to obtain an action at + 1, respectively inputting the data st + 1 and at + 1 into two value scores in the two value networks, determining a TD error according to the value scores and a prediction score, and updating the value networks; 4, updating the strategy network after the value network is updated twice; and step 5, repeating the steps 2-4 to perform network parameter tuning until the policy network achieves an expected effect, and outputting the finally updated policy network. The stability can be ensured in the process of optimiz</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	chi ; eng
recordid	cdi_epo_espacenet_CN118372851A
source	esp@cenet
subjects	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE ORDIFFERENT FUNCTION CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES CONTROLLING COUNTING PERFORMING OPERATIONS PHYSICS REGULATING ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TOTHE CONTROL OF A PARTICULAR SUB-UNIT SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES TRANSPORTING VEHICLES IN GENERAL
title	Vehicle optimal control method based on deep reinforcement learning
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-11T01%3A16%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=HUANG%20XIWEN&rft.date=2024-07-23&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN118372851A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true