Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction
This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilis...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2021-10 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Cai, Mingyu Xiao, Shaoping Li, Baoluo Li, Zhiliang Kan, Zhen |
description | This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results. |
doi_str_mv | 10.48550/arxiv.2010.06797 |
format | Article |
fullrecord | <record><control><sourceid>proquest_arxiv</sourceid><recordid>TN_cdi_arxiv_primary_2010_06797</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2451455784</sourcerecordid><originalsourceid>FETCH-LOGICAL-a524-ff52c3dd0c9f291061db99484e6abadd183dad3e7fc02053534ed4e76481fb543</originalsourceid><addsrcrecordid>eNotkE1LAzEYhIMgWGp_gCcDnrfmcz-OWtQKK4r25mF5d5PUlN2kJqnWf-_aehoYHoaZQeiCkrkopSTXEPb2a87IaJC8qIoTNGGc06wUjJ2hWYwbQgjLCyYln6D3V22d8aHTg3YJ1xqCs26NbyFqhVd62PoAPa792nZ44V0KvsffNn3gJ9jbYTfgl-BbaG1vYxqRN0g2GuiS9e4cnRroo5796xSt7u9Wi2VWPz88Lm7qDCQTmTGSdVwp0lWGVZTkVLVVJUqhc2hBKVpyBYrrwnSEEcklF1oJXeSipKaVgk_R5TH2sLzZBjtA-Gn-HmgOD4zE1ZHYBv-50zE1G78LbuzUMCGpkLIoBf8Fnttfgw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2451455784</pqid></control><display><type>article</type><title>Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction</title><source>arXiv.org</source><source>Free E- Journals</source><creator>Cai, Mingyu ; Xiao, Shaoping ; Li, Baoluo ; Li, Zhiliang ; Kan, Zhen</creator><creatorcontrib>Cai, Mingyu ; Xiao, Shaoping ; Li, Baoluo ; Li, Zhiliang ; Kan, Zhen</creatorcontrib><description>This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.</description><identifier>EISSN: 2331-8422</identifier><identifier>DOI: 10.48550/arxiv.2010.06797</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Computer Science - Artificial Intelligence ; Computer Science - Formal Languages and Automata Theory ; Computer Science - Robotics ; Machine learning ; Markov processes ; Mathematics - Optimization and Control ; Motion planning ; Robot dynamics ; Specifications ; Statistical analysis ; Temporal logic ; Transition probabilities</subject><ispartof>arXiv.org, 2021-10</ispartof><rights>2021. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,784,885,27924</link.rule.ids><backlink>$$Uhttps://doi.org/10.1109/ICRA48506.2021.9561903$$DView published paper (Access to full text may be restricted)$$Hfree_for_read</backlink><backlink>$$Uhttps://doi.org/10.48550/arXiv.2010.06797$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Cai, Mingyu</creatorcontrib><creatorcontrib>Xiao, Shaoping</creatorcontrib><creatorcontrib>Li, Baoluo</creatorcontrib><creatorcontrib>Li, Zhiliang</creatorcontrib><creatorcontrib>Kan, Zhen</creatorcontrib><title>Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction</title><title>arXiv.org</title><description>This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.</description><subject>Algorithms</subject><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Formal Languages and Automata Theory</subject><subject>Computer Science - Robotics</subject><subject>Machine learning</subject><subject>Markov processes</subject><subject>Mathematics - Optimization and Control</subject><subject>Motion planning</subject><subject>Robot dynamics</subject><subject>Specifications</subject><subject>Statistical analysis</subject><subject>Temporal logic</subject><subject>Transition probabilities</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><sourceid>GOX</sourceid><recordid>eNotkE1LAzEYhIMgWGp_gCcDnrfmcz-OWtQKK4r25mF5d5PUlN2kJqnWf-_aehoYHoaZQeiCkrkopSTXEPb2a87IaJC8qIoTNGGc06wUjJ2hWYwbQgjLCyYln6D3V22d8aHTg3YJ1xqCs26NbyFqhVd62PoAPa792nZ44V0KvsffNn3gJ9jbYTfgl-BbaG1vYxqRN0g2GuiS9e4cnRroo5796xSt7u9Wi2VWPz88Lm7qDCQTmTGSdVwp0lWGVZTkVLVVJUqhc2hBKVpyBYrrwnSEEcklF1oJXeSipKaVgk_R5TH2sLzZBjtA-Gn-HmgOD4zE1ZHYBv-50zE1G78LbuzUMCGpkLIoBf8Fnttfgw</recordid><startdate>20211005</startdate><enddate>20211005</enddate><creator>Cai, Mingyu</creator><creator>Xiao, Shaoping</creator><creator>Li, Baoluo</creator><creator>Li, Zhiliang</creator><creator>Kan, Zhen</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope><scope>AKY</scope><scope>AKZ</scope><scope>GOX</scope></search><sort><creationdate>20211005</creationdate><title>Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction</title><author>Cai, Mingyu ; Xiao, Shaoping ; Li, Baoluo ; Li, Zhiliang ; Kan, Zhen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a524-ff52c3dd0c9f291061db99484e6abadd183dad3e7fc02053534ed4e76481fb543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Algorithms</topic><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Formal Languages and Automata Theory</topic><topic>Computer Science - Robotics</topic><topic>Machine learning</topic><topic>Markov processes</topic><topic>Mathematics - Optimization and Control</topic><topic>Motion planning</topic><topic>Robot dynamics</topic><topic>Specifications</topic><topic>Statistical analysis</topic><topic>Temporal logic</topic><topic>Transition probabilities</topic><toplevel>online_resources</toplevel><creatorcontrib>Cai, Mingyu</creatorcontrib><creatorcontrib>Xiao, Shaoping</creatorcontrib><creatorcontrib>Li, Baoluo</creatorcontrib><creatorcontrib>Li, Zhiliang</creatorcontrib><creatorcontrib>Kan, Zhen</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Publicly Available Content Database</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection><collection>arXiv Computer Science</collection><collection>arXiv Mathematics</collection><collection>arXiv.org</collection><jtitle>arXiv.org</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Cai, Mingyu</au><au>Xiao, Shaoping</au><au>Li, Baoluo</au><au>Li, Zhiliang</au><au>Kan, Zhen</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction</atitle><jtitle>arXiv.org</jtitle><date>2021-10-05</date><risdate>2021</risdate><eissn>2331-8422</eissn><abstract>This paper presents a model-free reinforcement learning (RL) algorithm to synthesize a control policy that maximizes the satisfaction probability of linear temporal logic (LTL) specifications. Due to the consideration of environment and motion uncertainties, we model the robot motion as a probabilistic labeled Markov decision process with unknown transition probabilities and unknown probabilistic label functions. The LTL task specification is converted to a limit deterministic generalized B\"uchi automaton (LDGBA) with several accepting sets to maintain dense rewards during learning. The novelty of applying LDGBA is to construct an embedded LDGBA (E-LDGBA) by designing a synchronous tracking-frontier function, which enables the record of non-visited accepting sets without increasing dimensional and computational complexity. With appropriate dependent reward and discount functions, rigorous analysis shows that any method that optimizes the expected discount return of the RL-based approach is guaranteed to find the optimal policy that maximizes the satisfaction probability of the LTL specifications. A model-free RL-based motion planning strategy is developed to generate the optimal policy in this paper. The effectiveness of the RL-based control synthesis is demonstrated via simulation and experimental results.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><doi>10.48550/arxiv.2010.06797</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2021-10 |
issn | 2331-8422 |
language | eng |
recordid | cdi_arxiv_primary_2010_06797 |
source | arXiv.org; Free E- Journals |
subjects | Algorithms Computer Science - Artificial Intelligence Computer Science - Formal Languages and Automata Theory Computer Science - Robotics Machine learning Markov processes Mathematics - Optimization and Control Motion planning Robot dynamics Specifications Statistical analysis Temporal logic Transition probabilities |
title | Reinforcement Learning Based Temporal Logic Control with Maximum Probabilistic Satisfaction |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T17%3A45%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_arxiv&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Reinforcement%20Learning%20Based%20Temporal%20Logic%20Control%20with%20Maximum%20Probabilistic%20Satisfaction&rft.jtitle=arXiv.org&rft.au=Cai,%20Mingyu&rft.date=2021-10-05&rft.eissn=2331-8422&rft_id=info:doi/10.48550/arxiv.2010.06797&rft_dat=%3Cproquest_arxiv%3E2451455784%3C/proquest_arxiv%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2451455784&rft_id=info:pmid/&rfr_iscdi=true |