Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction

This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilis...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2021-10
Hauptverfasser:	Cai, Mingyu, Xiao, Shaoping, Li, Baoluo, Li, Zhiliang, Kan, Zhen
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Computer Science - Artificial Intelligence Computer Science - Formal Languages and Automata Theory Computer Science - Robotics Machine learning Markov processes Mathematics - Optimization and Control Motion planning Robot dynamics Specifications Statistical analysis Temporal logic Transition probabilities
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Cai, Mingyu Xiao, Shaoping Li, Baoluo Li, Zhiliang Kan, Zhen
description	This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
doi_str_mv	10.48550/arxiv.2010.06797
format	Article
fullrecord	<record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2010_06797</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2451455784</sourcerecordid><originalsourceid>FETCH-LOGICAL-a524-ff52c3dd0c9f291061db99484e6abadd183dad3e7fc02053534ed4e76481fb543</originalsourceid><addsrcrecordid>eNotkE1LAzEYhIMgWGp_gCcDnrfmcz-OWtQKK4r25mF5d5PUlN2kJqnWf-_aehoYHoaZQeiCkrkopSTXEPb2a87IaJC8qIoTNGGc06wUjJ2hWYwbQgjLCyYln6D3V22d8aHTg3YJ1xqCs26NbyFqhVd62PoAPa792nZ44V0KvsffNn3gJ9jbYTfgl-BbaG1vYxqRN0g2GuiS9e4cnRroo5796xSt7u9Wi2VWPz88Lm7qDCQTmTGSdVwp0lWGVZTkVLVVJUqhc2hBKVpyBYrrwnSEEcklF1oJXeSipKaVgk_R5TH2sLzZBjtA-Gn-HmgOD4zE1ZHYBv-50zE1G78LbuzUMCGpkLIoBf8Fnttfgw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2451455784</pqid></control><display><type>article</type><title>Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Cai, Mingyu ; Xiao, Shaoping ; Li, Baoluo ; Li, Zhiliang ; Kan, Zhen</creator><creatorcontrib>Cai, Mingyu ; Xiao, Shaoping ; Li, Baoluo ; Li, Zhiliang ; Kan, Zhen</creatorcontrib><description>This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2010.06797</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Computer Science - Artificial Intelligence ; Computer Science - Formal Languages and Automata Theory ; Computer Science - Robotics ; Machine learning ; Markov processes ; Mathematics - Optimization and Control ; Motion planning ; Robot dynamics ; Specifications ; Statistical analysis ; Temporal logic ; Transition probabilities</subject><ispartof>arXiv.org, 2021-10</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27924</link.rule.ids><backlink>$$Uhttps://doi.org/10.1109/ICRA48506.2021.9561903$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.06797$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Cai, Mingyu</creatorcontrib><creatorcontrib>Xiao, Shaoping</creatorcontrib><creatorcontrib>Li, Baoluo</creatorcontrib><creatorcontrib>Li, Zhiliang</creatorcontrib><creatorcontrib>Kan, Zhen</creatorcontrib><title>Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction</title><title>arXiv.org</title><description>This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.</description><subject>Algorithms</subject><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Formal Languages and Automata Theory</subject><subject>Computer Science - Robotics</subject><subject>Machine learning</subject><subject>Markov processes</subject><subject>Mathematics - Optimization and Control</subject><subject>Motion planning</subject><subject>Robot dynamics</subject><subject>Specifications</subject><subject>Statistical analysis</subject><subject>Temporal logic</subject><subject>Transition probabilities</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkE1LAzEYhIMgWGp_gCcDnrfmcz-OWtQKK4r25mF5d5PUlN2kJqnWf-_aehoYHoaZQeiCkrkopSTXEPb2a87IaJC8qIoTNGGc06wUjJ2hWYwbQgjLCyYln6D3V22d8aHTg3YJ1xqCs26NbyFqhVd62PoAPa792nZ44V0KvsffNn3gJ9jbYTfgl-BbaG1vYxqRN0g2GuiS9e4cnRroo5796xSt7u9Wi2VWPz88Lm7qDCQTmTGSdVwp0lWGVZTkVLVVJUqhc2hBKVpyBYrrwnSEEcklF1oJXeSipKaVgk_R5TH2sLzZBjtA-Gn-HmgOD4zE1ZHYBv-50zE1G78LbuzUMCGpkLIoBf8Fnttfgw</recordid><startdate>20211005</startdate><enddate>20211005</enddate><creator>Cai, Mingyu</creator><creator>Xiao, Shaoping</creator><creator>Li, Baoluo</creator><creator>Li, Zhiliang</creator><creator>Kan, Zhen</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>AKZ</scope><scope>GOX</scope></search><sort><creationdate>20211005</creationdate><title>Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction</title><author>Cai, Mingyu ; Xiao, Shaoping ; Li, Baoluo ; Li, Zhiliang ; Kan, Zhen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a524-ff52c3dd0c9f291061db99484e6abadd183dad3e7fc02053534ed4e76481fb543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Formal Languages and Automata Theory</topic><topic>Computer Science - Robotics</topic><topic>Machine learning</topic><topic>Markov processes</topic><topic>Mathematics - Optimization and Control</topic><topic>Motion planning</topic><topic>Robot dynamics</topic><topic>Specifications</topic><topic>Statistical analysis</topic><topic>Temporal logic</topic><topic>Transition probabilities</topic><toplevel>online_resources</toplevel><creatorcontrib>Cai, Mingyu</creatorcontrib><creatorcontrib>Xiao, Shaoping</creatorcontrib><creatorcontrib>Li, Baoluo</creatorcontrib><creatorcontrib>Li, Zhiliang</creatorcontrib><creatorcontrib>Kan, Zhen</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cai, Mingyu</au><au>Xiao, Shaoping</au><au>Li, Baoluo</au><au>Li, Zhiliang</au><au>Kan, Zhen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction</atitle><jtitle>arXiv.org</jtitle><date>2021-10-05</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2010.06797</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2021-10
issn	2331-8422
language	eng
recordid	cdi_arxiv_primary_2010_06797
source	arXiv.org; Free E- Journals
subjects	Algorithms Computer Science - Artificial Intelligence Computer Science - Formal Languages and Automata Theory Computer Science - Robotics Machine learning Markov processes Mathematics - Optimization and Control Motion planning Robot dynamics Specifications Statistical analysis Temporal logic Transition probabilities
title	Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T17%3A45%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Reinforcement%20Learning%20Based%20Temporal%20Logic%20Control%20with%20Maximum%20Probabilistic%20Satisfaction&rft.jtitle=arXiv.org&rft.au=Cai,%20Mingyu&rft.date=2021-10-05&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2010.06797&rft_dat=%3Cproquest_arxiv%3E2451455784%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2451455784&rft_id=info:pmid/&rfr_iscdi=true