Vehicle optimal control method based on deep reinforcement learning

The invention discloses a vehicle optimal control method based on deep reinforcement learning. The method comprises the following steps: step 1, establishing a strategy network and a mutually independent value network; 2, the vehicle is controlled to run, and samples are collected; 3, inputting the...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	HUANG XIWEN, HUANG XIANGDANG, FEI HANSHENG, YANG QIULING
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING CONJOINT CONTROL OF VEHICLE SUB-UNITS OF DIFFERENT TYPE ORDIFFERENT FUNCTION CONTROL SYSTEMS SPECIALLY ADAPTED FOR HYBRID VEHICLES CONTROLLING COUNTING PERFORMING OPERATIONS PHYSICS REGULATING ROAD VEHICLE DRIVE CONTROL SYSTEMS FOR PURPOSES NOT RELATED TOTHE CONTROL OF A PARTICULAR SUB-UNIT SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES TRANSPORTING VEHICLES IN GENERAL
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention discloses a vehicle optimal control method based on deep reinforcement learning. The method comprises the following steps: step 1, establishing a strategy network and a mutually independent value network; 2, the vehicle is controlled to run, and samples are collected; 3, inputting the data st and at into a value network to obtain two value scores, and calculating a prediction score by taking the smaller value; inputting the state st + 1 into the strategy network to obtain an action at + 1, respectively inputting the data st + 1 and at + 1 into two value scores in the two value networks, determining a TD error according to the value scores and a prediction score, and updating the value networks; 4, updating the strategy network after the value network is updated twice; and step 5, repeating the steps 2-4 to perform network parameter tuning until the policy network achieves an expected effect, and outputting the finally updated policy network. The stability can be ensured in the process of optimiz