Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction

This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilis...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2021-10
Hauptverfasser: Cai, Mingyu, Xiao, Shaoping, Li, Baoluo, Li, Zhiliang, Kan, Zhen
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Cai, Mingyu
Xiao, Shaoping
Li, Baoluo
Li, Zhiliang
Kan, Zhen
description This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.
doi_str_mv 10.48550/arxiv.2010.06797
format Article
fullrecord <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2010_06797</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2451455784</sourcerecordid><originalsourceid>FETCH-LOGICAL-a524-ff52c3dd0c9f291061db99484e6abadd183dad3e7fc02053534ed4e76481fb543</originalsourceid><addsrcrecordid>eNotkE1LAzEYhIMgWGp_gCcDnrfmcz-OWtQKK4r25mF5d5PUlN2kJqnWf-_aehoYHoaZQeiCkrkopSTXEPb2a87IaJC8qIoTNGGc06wUjJ2hWYwbQgjLCyYln6D3V22d8aHTg3YJ1xqCs26NbyFqhVd62PoAPa792nZ44V0KvsffNn3gJ9jbYTfgl-BbaG1vYxqRN0g2GuiS9e4cnRroo5796xSt7u9Wi2VWPz88Lm7qDCQTmTGSdVwp0lWGVZTkVLVVJUqhc2hBKVpyBYrrwnSEEcklF1oJXeSipKaVgk_R5TH2sLzZBjtA-Gn-HmgOD4zE1ZHYBv-50zE1G78LbuzUMCGpkLIoBf8Fnttfgw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2451455784</pqid></control><display><type>article</type><title>Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Cai, Mingyu ; Xiao, Shaoping ; Li, Baoluo ; Li, Zhiliang ; Kan, Zhen</creator><creatorcontrib>Cai, Mingyu ; Xiao, Shaoping ; Li, Baoluo ; Li, Zhiliang ; Kan, Zhen</creatorcontrib><description>This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2010.06797</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Computer Science - Artificial Intelligence ; Computer Science - Formal Languages and Automata Theory ; Computer Science - Robotics ; Machine learning ; Markov processes ; Mathematics - Optimization and Control ; Motion planning ; Robot dynamics ; Specifications ; Statistical analysis ; Temporal logic ; Transition probabilities</subject><ispartof>arXiv.org, 2021-10</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27924</link.rule.ids><backlink>$$Uhttps://doi.org/10.1109/ICRA48506.2021.9561903$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.06797$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Cai, Mingyu</creatorcontrib><creatorcontrib>Xiao, Shaoping</creatorcontrib><creatorcontrib>Li, Baoluo</creatorcontrib><creatorcontrib>Li, Zhiliang</creatorcontrib><creatorcontrib>Kan, Zhen</creatorcontrib><title>Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction</title><title>arXiv.org</title><description>This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.</description><subject>Algorithms</subject><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Formal Languages and Automata Theory</subject><subject>Computer Science - Robotics</subject><subject>Machine learning</subject><subject>Markov processes</subject><subject>Mathematics - Optimization and Control</subject><subject>Motion planning</subject><subject>Robot dynamics</subject><subject>Specifications</subject><subject>Statistical analysis</subject><subject>Temporal logic</subject><subject>Transition probabilities</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkE1LAzEYhIMgWGp_gCcDnrfmcz-OWtQKK4r25mF5d5PUlN2kJqnWf-_aehoYHoaZQeiCkrkopSTXEPb2a87IaJC8qIoTNGGc06wUjJ2hWYwbQgjLCyYln6D3V22d8aHTg3YJ1xqCs26NbyFqhVd62PoAPa792nZ44V0KvsffNn3gJ9jbYTfgl-BbaG1vYxqRN0g2GuiS9e4cnRroo5796xSt7u9Wi2VWPz88Lm7qDCQTmTGSdVwp0lWGVZTkVLVVJUqhc2hBKVpyBYrrwnSEEcklF1oJXeSipKaVgk_R5TH2sLzZBjtA-Gn-HmgOD4zE1ZHYBv-50zE1G78LbuzUMCGpkLIoBf8Fnttfgw</recordid><startdate>20211005</startdate><enddate>20211005</enddate><creator>Cai, Mingyu</creator><creator>Xiao, Shaoping</creator><creator>Li, Baoluo</creator><creator>Li, Zhiliang</creator><creator>Kan, Zhen</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>AKZ</scope><scope>GOX</scope></search><sort><creationdate>20211005</creationdate><title>Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction</title><author>Cai, Mingyu ; Xiao, Shaoping ; Li, Baoluo ; Li, Zhiliang ; Kan, Zhen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a524-ff52c3dd0c9f291061db99484e6abadd183dad3e7fc02053534ed4e76481fb543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Formal Languages and Automata Theory</topic><topic>Computer Science - Robotics</topic><topic>Machine learning</topic><topic>Markov processes</topic><topic>Mathematics - Optimization and Control</topic><topic>Motion planning</topic><topic>Robot dynamics</topic><topic>Specifications</topic><topic>Statistical analysis</topic><topic>Temporal logic</topic><topic>Transition probabilities</topic><toplevel>online_resources</toplevel><creatorcontrib>Cai, Mingyu</creatorcontrib><creatorcontrib>Xiao, Shaoping</creatorcontrib><creatorcontrib>Li, Baoluo</creatorcontrib><creatorcontrib>Li, Zhiliang</creatorcontrib><creatorcontrib>Kan, Zhen</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cai, Mingyu</au><au>Xiao, Shaoping</au><au>Li, Baoluo</au><au>Li, Zhiliang</au><au>Kan, Zhen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction</atitle><jtitle>arXiv.org</jtitle><date>2021-10-05</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2010.06797</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2021-10
issn 2331-8422
language eng
recordid cdi_arxiv_primary_2010_06797
source arXiv.org; Free E- Journals
subjects Algorithms
Computer Science - Artificial Intelligence
Computer Science - Formal Languages and Automata Theory
Computer Science - Robotics
Machine learning
Markov processes
Mathematics - Optimization and Control
Motion planning
Robot dynamics
Specifications
Statistical analysis
Temporal logic
Transition probabilities
title Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T17%3A45%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Reinforcement%20Learning%20Based%20Temporal%20Logic%20Control%20with%20Maximum%20Probabilistic%20Satisfaction&rft.jtitle=arXiv.org&rft.au=Cai,%20Mingyu&rft.date=2021-10-05&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2010.06797&rft_dat=%3Cproquest_arxiv%3E2451455784%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2451455784&rft_id=info:pmid/&rfr_iscdi=true