Path planning method and system based on reinforcement learning and heuristic search

The invention discloses a path planning method and system based on reinforcement learning and heuristic search. The method comprises the steps of S1 establishing an environment model under a Markov decision process framework, wherein the state space of the environment model is S, the action space of...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	ZHANG XIULING, KANG XUENAN, LI JINXIANG
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CONTROL OR REGULATING SYSTEMS IN GENERAL CONTROLLING FUNCTIONAL ELEMENTS OF SUCH SYSTEMS GYROSCOPIC INSTRUMENTS MEASURING MEASURING DISTANCES, LEVELS OR BEARINGS MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS ORELEMENTS NAVIGATION PHOTOGRAMMETRY OR VIDEOGRAMMETRY PHYSICS REGULATING SURVEYING SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES TESTING
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	ZHANG XIULING KANG XUENAN LI JINXIANG
description	The invention discloses a path planning method and system based on reinforcement learning and heuristic search. The method comprises the steps of S1 establishing an environment model under a Markov decision process framework, wherein the state space of the environment model is S, the action space of the environment model is A, the reward function of the environment model is R, and the transition probability function of the environment model is P; S2 performing sampling updating on the environment model through a Dyna-Q algorithm, evaluating each state-action pair, and determining a target point; S3 based on the target point, calculating Euclidean distances between the current position and the starting point and between the current position and the target point through an A * algorithm, anddetermining an initial path; S4 assigning a value to each state-action pair in the initial path; S5 determining an optimal action according to the evaluation value and assignment of each state-actionpair; and S6 determining
format	Patent
fullrecord	<record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN111896006A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN111896006A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN111896006A3</originalsourceid><addsrcrecordid>eNqNi80KwjAQBnvxIOo7rA8gNAhFj1IUT-Kh97ImX02g2ZRsPPj2_uADeBoYZuZVd-XiaRpZJMidIopPjlgc6VMLIt1Y4SgJZQQZUraIkEIjOH-PT-rxyEFLsKRvbf2ymg08KlY_Lqr16di15w2m1EMnthCUvr0YY3b7pq6bw_af5gWh2Tju</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Path planning method and system based on reinforcement learning and heuristic search</title><source>esp@cenet</source><creator>ZHANG XIULING ; KANG XUENAN ; LI JINXIANG</creator><creatorcontrib>ZHANG XIULING ; KANG XUENAN ; LI JINXIANG</creatorcontrib><description>The invention discloses a path planning method and system based on reinforcement learning and heuristic search. The method comprises the steps of S1 establishing an environment model under a Markov decision process framework, wherein the state space of the environment model is S, the action space of the environment model is A, the reward function of the environment model is R, and the transition probability function of the environment model is P; S2 performing sampling updating on the environment model through a Dyna-Q algorithm, evaluating each state-action pair, and determining a target point; S3 based on the target point, calculating Euclidean distances between the current position and the starting point and between the current position and the target point through an A * algorithm, anddetermining an initial path; S4 assigning a value to each state-action pair in the initial path; S5 determining an optimal action according to the evaluation value and assignment of each state-actionpair; and S6 determining</description><language>chi ; eng</language><subject>CONTROL OR REGULATING SYSTEMS IN GENERAL ; CONTROLLING ; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS ; GYROSCOPIC INSTRUMENTS ; MEASURING ; MEASURING DISTANCES, LEVELS OR BEARINGS ; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS ORELEMENTS ; NAVIGATION ; PHOTOGRAMMETRY OR VIDEOGRAMMETRY ; PHYSICS ; REGULATING ; SURVEYING ; SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES ; TESTING</subject><creationdate>2020</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20201106&DB=EPODOC&CC=CN&NR=111896006A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&date=20201106&DB=EPODOC&CC=CN&NR=111896006A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>ZHANG XIULING</creatorcontrib><creatorcontrib>KANG XUENAN</creatorcontrib><creatorcontrib>LI JINXIANG</creatorcontrib><title>Path planning method and system based on reinforcement learning and heuristic search</title><description>The invention discloses a path planning method and system based on reinforcement learning and heuristic search. The method comprises the steps of S1 establishing an environment model under a Markov decision process framework, wherein the state space of the environment model is S, the action space of the environment model is A, the reward function of the environment model is R, and the transition probability function of the environment model is P; S2 performing sampling updating on the environment model through a Dyna-Q algorithm, evaluating each state-action pair, and determining a target point; S3 based on the target point, calculating Euclidean distances between the current position and the starting point and between the current position and the target point through an A * algorithm, anddetermining an initial path; S4 assigning a value to each state-action pair in the initial path; S5 determining an optimal action according to the evaluation value and assignment of each state-actionpair; and S6 determining</description><subject>CONTROL OR REGULATING SYSTEMS IN GENERAL</subject><subject>CONTROLLING</subject><subject>FUNCTIONAL ELEMENTS OF SUCH SYSTEMS</subject><subject>GYROSCOPIC INSTRUMENTS</subject><subject>MEASURING</subject><subject>MEASURING DISTANCES, LEVELS OR BEARINGS</subject><subject>MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS ORELEMENTS</subject><subject>NAVIGATION</subject><subject>PHOTOGRAMMETRY OR VIDEOGRAMMETRY</subject><subject>PHYSICS</subject><subject>REGULATING</subject><subject>SURVEYING</subject><subject>SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES</subject><subject>TESTING</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2020</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNi80KwjAQBnvxIOo7rA8gNAhFj1IUT-Kh97ImX02g2ZRsPPj2_uADeBoYZuZVd-XiaRpZJMidIopPjlgc6VMLIt1Y4SgJZQQZUraIkEIjOH-PT-rxyEFLsKRvbf2ymg08KlY_Lqr16di15w2m1EMnthCUvr0YY3b7pq6bw_af5gWh2Tju</recordid><startdate>20201106</startdate><enddate>20201106</enddate><creator>ZHANG XIULING</creator><creator>KANG XUENAN</creator><creator>LI JINXIANG</creator><scope>EVB</scope></search><sort><creationdate>20201106</creationdate><title>Path planning method and system based on reinforcement learning and heuristic search</title><author>ZHANG XIULING ; KANG XUENAN ; LI JINXIANG</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN111896006A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2020</creationdate><topic>CONTROL OR REGULATING SYSTEMS IN GENERAL</topic><topic>CONTROLLING</topic><topic>FUNCTIONAL ELEMENTS OF SUCH SYSTEMS</topic><topic>GYROSCOPIC INSTRUMENTS</topic><topic>MEASURING</topic><topic>MEASURING DISTANCES, LEVELS OR BEARINGS</topic><topic>MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS ORELEMENTS</topic><topic>NAVIGATION</topic><topic>PHOTOGRAMMETRY OR VIDEOGRAMMETRY</topic><topic>PHYSICS</topic><topic>REGULATING</topic><topic>SURVEYING</topic><topic>SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES</topic><topic>TESTING</topic><toplevel>online_resources</toplevel><creatorcontrib>ZHANG XIULING</creatorcontrib><creatorcontrib>KANG XUENAN</creatorcontrib><creatorcontrib>LI JINXIANG</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>ZHANG XIULING</au><au>KANG XUENAN</au><au>LI JINXIANG</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Path planning method and system based on reinforcement learning and heuristic search</title><date>2020-11-06</date><risdate>2020</risdate><abstract>The invention discloses a path planning method and system based on reinforcement learning and heuristic search. The method comprises the steps of S1 establishing an environment model under a Markov decision process framework, wherein the state space of the environment model is S, the action space of the environment model is A, the reward function of the environment model is R, and the transition probability function of the environment model is P; S2 performing sampling updating on the environment model through a Dyna-Q algorithm, evaluating each state-action pair, and determining a target point; S3 based on the target point, calculating Euclidean distances between the current position and the starting point and between the current position and the target point through an A * algorithm, anddetermining an initial path; S4 assigning a value to each state-action pair in the initial path; S5 determining an optimal action according to the evaluation value and assignment of each state-actionpair; and S6 determining</abstract><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier
ispartof
issn
language	chi ; eng
recordid	cdi_epo_espacenet_CN111896006A
source	esp@cenet
subjects	CONTROL OR REGULATING SYSTEMS IN GENERAL CONTROLLING FUNCTIONAL ELEMENTS OF SUCH SYSTEMS GYROSCOPIC INSTRUMENTS MEASURING MEASURING DISTANCES, LEVELS OR BEARINGS MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS ORELEMENTS NAVIGATION PHOTOGRAMMETRY OR VIDEOGRAMMETRY PHYSICS REGULATING SURVEYING SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES TESTING
title	Path planning method and system based on reinforcement learning and heuristic search
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T04%3A34%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=ZHANG%20XIULING&rft.date=2020-11-06&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN111896006A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true