Path planning method and system based on reinforcement learning and heuristic search

The invention discloses a path planning method and system based on reinforcement learning and heuristic search. The method comprises the steps of S1 establishing an environment model under a Markov decision process framework, wherein the state space of the environment model is S, the action space of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	ZHANG XIULING, KANG XUENAN, LI JINXIANG
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CONTROL OR REGULATING SYSTEMS IN GENERAL CONTROLLING FUNCTIONAL ELEMENTS OF SUCH SYSTEMS GYROSCOPIC INSTRUMENTS MEASURING MEASURING DISTANCES, LEVELS OR BEARINGS MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS ORELEMENTS NAVIGATION PHOTOGRAMMETRY OR VIDEOGRAMMETRY PHYSICS REGULATING SURVEYING SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES TESTING
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention discloses a path planning method and system based on reinforcement learning and heuristic search. The method comprises the steps of S1 establishing an environment model under a Markov decision process framework, wherein the state space of the environment model is S, the action space of the environment model is A, the reward function of the environment model is R, and the transition probability function of the environment model is P; S2 performing sampling updating on the environment model through a Dyna-Q algorithm, evaluating each state-action pair, and determining a target point; S3 based on the target point, calculating Euclidean distances between the current position and the starting point and between the current position and the target point through an A * algorithm, anddetermining an initial path; S4 assigning a value to each state-action pair in the initial path; S5 determining an optimal action according to the evaluation value and assignment of each state-actionpair; and S6 determining