Path planning method and system based on reinforcement learning and heuristic search

The invention discloses a path planning method and system based on reinforcement learning and heuristic search. The method comprises the steps of S1 establishing an environment model under a Markov decision process framework, wherein the state space of the environment model is S, the action space of...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: ZHANG XIULING, KANG XUENAN, LI JINXIANG
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator ZHANG XIULING
KANG XUENAN
LI JINXIANG
description The invention discloses a path planning method and system based on reinforcement learning and heuristic search. The method comprises the steps of S1 establishing an environment model under a Markov decision process framework, wherein the state space of the environment model is S, the action space of the environment model is A, the reward function of the environment model is R, and the transition probability function of the environment model is P; S2 performing sampling updating on the environment model through a Dyna-Q algorithm, evaluating each state-action pair, and determining a target point; S3 based on the target point, calculating Euclidean distances between the current position and the starting point and between the current position and the target point through an A * algorithm, anddetermining an initial path; S4 assigning a value to each state-action pair in the initial path; S5 determining an optimal action according to the evaluation value and assignment of each state-actionpair; and S6 determining
format Patent
fullrecord <record><control><sourceid>epo_EVB</sourceid><recordid>TN_cdi_epo_espacenet_CN111896006A</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>CN111896006A</sourcerecordid><originalsourceid>FETCH-epo_espacenet_CN111896006A3</originalsourceid><addsrcrecordid>eNqNi80KwjAQBnvxIOo7rA8gNAhFj1IUT-Kh97ImX02g2ZRsPPj2_uADeBoYZuZVd-XiaRpZJMidIopPjlgc6VMLIt1Y4SgJZQQZUraIkEIjOH-PT-rxyEFLsKRvbf2ymg08KlY_Lqr16di15w2m1EMnthCUvr0YY3b7pq6bw_af5gWh2Tju</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>patent</recordtype></control><display><type>patent</type><title>Path planning method and system based on reinforcement learning and heuristic search</title><source>esp@cenet</source><creator>ZHANG XIULING ; KANG XUENAN ; LI JINXIANG</creator><creatorcontrib>ZHANG XIULING ; KANG XUENAN ; LI JINXIANG</creatorcontrib><description>The invention discloses a path planning method and system based on reinforcement learning and heuristic search. The method comprises the steps of S1 establishing an environment model under a Markov decision process framework, wherein the state space of the environment model is S, the action space of the environment model is A, the reward function of the environment model is R, and the transition probability function of the environment model is P; S2 performing sampling updating on the environment model through a Dyna-Q algorithm, evaluating each state-action pair, and determining a target point; S3 based on the target point, calculating Euclidean distances between the current position and the starting point and between the current position and the target point through an A * algorithm, anddetermining an initial path; S4 assigning a value to each state-action pair in the initial path; S5 determining an optimal action according to the evaluation value and assignment of each state-actionpair; and S6 determining</description><language>chi ; eng</language><subject>CONTROL OR REGULATING SYSTEMS IN GENERAL ; CONTROLLING ; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS ; GYROSCOPIC INSTRUMENTS ; MEASURING ; MEASURING DISTANCES, LEVELS OR BEARINGS ; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS ORELEMENTS ; NAVIGATION ; PHOTOGRAMMETRY OR VIDEOGRAMMETRY ; PHYSICS ; REGULATING ; SURVEYING ; SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES ; TESTING</subject><creationdate>2020</creationdate><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20201106&amp;DB=EPODOC&amp;CC=CN&amp;NR=111896006A$$EHTML$$P50$$Gepo$$Hfree_for_read</linktohtml><link.rule.ids>230,308,780,885,25564,76547</link.rule.ids><linktorsrc>$$Uhttps://worldwide.espacenet.com/publicationDetails/biblio?FT=D&amp;date=20201106&amp;DB=EPODOC&amp;CC=CN&amp;NR=111896006A$$EView_record_in_European_Patent_Office$$FView_record_in_$$GEuropean_Patent_Office$$Hfree_for_read</linktorsrc></links><search><creatorcontrib>ZHANG XIULING</creatorcontrib><creatorcontrib>KANG XUENAN</creatorcontrib><creatorcontrib>LI JINXIANG</creatorcontrib><title>Path planning method and system based on reinforcement learning and heuristic search</title><description>The invention discloses a path planning method and system based on reinforcement learning and heuristic search. The method comprises the steps of S1 establishing an environment model under a Markov decision process framework, wherein the state space of the environment model is S, the action space of the environment model is A, the reward function of the environment model is R, and the transition probability function of the environment model is P; S2 performing sampling updating on the environment model through a Dyna-Q algorithm, evaluating each state-action pair, and determining a target point; S3 based on the target point, calculating Euclidean distances between the current position and the starting point and between the current position and the target point through an A * algorithm, anddetermining an initial path; S4 assigning a value to each state-action pair in the initial path; S5 determining an optimal action according to the evaluation value and assignment of each state-actionpair; and S6 determining</description><subject>CONTROL OR REGULATING SYSTEMS IN GENERAL</subject><subject>CONTROLLING</subject><subject>FUNCTIONAL ELEMENTS OF SUCH SYSTEMS</subject><subject>GYROSCOPIC INSTRUMENTS</subject><subject>MEASURING</subject><subject>MEASURING DISTANCES, LEVELS OR BEARINGS</subject><subject>MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS ORELEMENTS</subject><subject>NAVIGATION</subject><subject>PHOTOGRAMMETRY OR VIDEOGRAMMETRY</subject><subject>PHYSICS</subject><subject>REGULATING</subject><subject>SURVEYING</subject><subject>SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES</subject><subject>TESTING</subject><fulltext>true</fulltext><rsrctype>patent</rsrctype><creationdate>2020</creationdate><recordtype>patent</recordtype><sourceid>EVB</sourceid><recordid>eNqNi80KwjAQBnvxIOo7rA8gNAhFj1IUT-Kh97ImX02g2ZRsPPj2_uADeBoYZuZVd-XiaRpZJMidIopPjlgc6VMLIt1Y4SgJZQQZUraIkEIjOH-PT-rxyEFLsKRvbf2ymg08KlY_Lqr16di15w2m1EMnthCUvr0YY3b7pq6bw_af5gWh2Tju</recordid><startdate>20201106</startdate><enddate>20201106</enddate><creator>ZHANG XIULING</creator><creator>KANG XUENAN</creator><creator>LI JINXIANG</creator><scope>EVB</scope></search><sort><creationdate>20201106</creationdate><title>Path planning method and system based on reinforcement learning and heuristic search</title><author>ZHANG XIULING ; KANG XUENAN ; LI JINXIANG</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-epo_espacenet_CN111896006A3</frbrgroupid><rsrctype>patents</rsrctype><prefilter>patents</prefilter><language>chi ; eng</language><creationdate>2020</creationdate><topic>CONTROL OR REGULATING SYSTEMS IN GENERAL</topic><topic>CONTROLLING</topic><topic>FUNCTIONAL ELEMENTS OF SUCH SYSTEMS</topic><topic>GYROSCOPIC INSTRUMENTS</topic><topic>MEASURING</topic><topic>MEASURING DISTANCES, LEVELS OR BEARINGS</topic><topic>MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS ORELEMENTS</topic><topic>NAVIGATION</topic><topic>PHOTOGRAMMETRY OR VIDEOGRAMMETRY</topic><topic>PHYSICS</topic><topic>REGULATING</topic><topic>SURVEYING</topic><topic>SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES</topic><topic>TESTING</topic><toplevel>online_resources</toplevel><creatorcontrib>ZHANG XIULING</creatorcontrib><creatorcontrib>KANG XUENAN</creatorcontrib><creatorcontrib>LI JINXIANG</creatorcontrib><collection>esp@cenet</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>ZHANG XIULING</au><au>KANG XUENAN</au><au>LI JINXIANG</au><format>patent</format><genre>patent</genre><ristype>GEN</ristype><title>Path planning method and system based on reinforcement learning and heuristic search</title><date>2020-11-06</date><risdate>2020</risdate><abstract>The invention discloses a path planning method and system based on reinforcement learning and heuristic search. The method comprises the steps of S1 establishing an environment model under a Markov decision process framework, wherein the state space of the environment model is S, the action space of the environment model is A, the reward function of the environment model is R, and the transition probability function of the environment model is P; S2 performing sampling updating on the environment model through a Dyna-Q algorithm, evaluating each state-action pair, and determining a target point; S3 based on the target point, calculating Euclidean distances between the current position and the starting point and between the current position and the target point through an A * algorithm, anddetermining an initial path; S4 assigning a value to each state-action pair in the initial path; S5 determining an optimal action according to the evaluation value and assignment of each state-actionpair; and S6 determining</abstract><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier
ispartof
issn
language chi ; eng
recordid cdi_epo_espacenet_CN111896006A
source esp@cenet
subjects CONTROL OR REGULATING SYSTEMS IN GENERAL
CONTROLLING
FUNCTIONAL ELEMENTS OF SUCH SYSTEMS
GYROSCOPIC INSTRUMENTS
MEASURING
MEASURING DISTANCES, LEVELS OR BEARINGS
MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS ORELEMENTS
NAVIGATION
PHOTOGRAMMETRY OR VIDEOGRAMMETRY
PHYSICS
REGULATING
SURVEYING
SYSTEMS FOR CONTROLLING OR REGULATING NON-ELECTRIC VARIABLES
TESTING
title Path planning method and system based on reinforcement learning and heuristic search
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-22T04%3A34%3A56IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-epo_EVB&rft_val_fmt=info:ofi/fmt:kev:mtx:patent&rft.genre=patent&rft.au=ZHANG%20XIULING&rft.date=2020-11-06&rft_id=info:doi/&rft_dat=%3Cepo_EVB%3ECN111896006A%3C/epo_EVB%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true