Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains

Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies is a critical requirement to enable the real-world deployment of AI agents. Recently, Signal Temporal Logic (STL) has been shown to be an effective language for encoding rules as spatio-temporal constraints. This work uses M...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Aloor, Jasmine Jerry, Patrikar, Jay, Kapoor, Parv, Oh, Jean, Scherer, Sebastian
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Robotics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Aloor, Jasmine Jerry Patrikar, Jay Kapoor, Parv Oh, Jean Scherer, Sebastian
description	Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies is a critical requirement to enable the real-world deployment of AI agents. Recently, Signal Temporal Logic (STL) has been shown to be an effective language for encoding rules as spatio-temporal constraints. This work uses Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into a vanilla LfD policy to improve constraint satisfaction. We propose augmenting the MCTS heuristic with STL robustness values to bias the tree search towards branches with higher constraint satisfaction. While the domain-independent method can be applied to integrate STL rules online into any pre-trained LfD algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning as the offline LfD policy. We apply the proposed method to the domain of planning trajectories for General Aviation aircraft around a non-towered airfield. Results using the simulator trained on real-world data showcase 60% improved performance over baseline LfD methods that do not use STL heuristics.
doi_str_mv	10.48550/arxiv.2209.13737
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2209_13737</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2209_13737</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-1e60a7d6a481059d8341d75726492186d5dc8d1106a59711ed50346e7ab387d63</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwIr7Awl2HD_CDhVaKkWqRLOPLrGbWHLsykl5_D2hsJqRRmekQ8gdo3mphaAPmL7cR14UtMoZV1xdk7CJ3sdPaAYLb2dvp0fYB--ChYPrA3po7HiKaSl17F0HTbLLZDF1Axxjgu3ZGWtgN7oZZxcD1MsWXOjBBTjMsRtwmhfuOY7ownRDro7oJ3v7nyvSbF6a9WtW77e79VOdoVQqY1ZSVEZiqRkVldG8ZEYJVciyKpiWRphOG8aoRFEpxqwRlJfSKnzneuH4itz_3V6E21NyI6bv9le8vYjzHwFbUnk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains</title><source>arXiv.org</source><creator>Aloor, Jasmine Jerry ; Patrikar, Jay ; Kapoor, Parv ; Oh, Jean ; Scherer, Sebastian</creator><creatorcontrib>Aloor, Jasmine Jerry ; Patrikar, Jay ; Kapoor, Parv ; Oh, Jean ; Scherer, Sebastian</creatorcontrib><description>Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies is a critical requirement to enable the real-world deployment of AI agents. Recently, Signal Temporal Logic (STL) has been shown to be an effective language for encoding rules as spatio-temporal constraints. This work uses Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into a vanilla LfD policy to improve constraint satisfaction. We propose augmenting the MCTS heuristic with STL robustness values to bias the tree search towards branches with higher constraint satisfaction. While the domain-independent method can be applied to integrate STL rules online into any pre-trained LfD algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning as the offline LfD policy. We apply the proposed method to the domain of planning trajectories for General Aviation aircraft around a non-towered airfield. Results using the simulator trained on real-world data showcase 60% improved performance over baseline LfD methods that do not use STL heuristics.</description><identifier>DOI: 10.48550/arxiv.2209.13737</identifier><language>eng</language><subject>Computer Science - Robotics</subject><creationdate>2022-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2209.13737$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2209.13737$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Aloor, Jasmine Jerry</creatorcontrib><creatorcontrib>Patrikar, Jay</creatorcontrib><creatorcontrib>Kapoor, Parv</creatorcontrib><creatorcontrib>Oh, Jean</creatorcontrib><creatorcontrib>Scherer, Sebastian</creatorcontrib><title>Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains</title><description>Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies is a critical requirement to enable the real-world deployment of AI agents. Recently, Signal Temporal Logic (STL) has been shown to be an effective language for encoding rules as spatio-temporal constraints. This work uses Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into a vanilla LfD policy to improve constraint satisfaction. We propose augmenting the MCTS heuristic with STL robustness values to bias the tree search towards branches with higher constraint satisfaction. While the domain-independent method can be applied to integrate STL rules online into any pre-trained LfD algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning as the offline LfD policy. We apply the proposed method to the domain of planning trajectories for General Aviation aircraft around a non-towered airfield. Results using the simulator trained on real-world data showcase 60% improved performance over baseline LfD methods that do not use STL heuristics.</description><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwIr7Awl2HD_CDhVaKkWqRLOPLrGbWHLsykl5_D2hsJqRRmekQ8gdo3mphaAPmL7cR14UtMoZV1xdk7CJ3sdPaAYLb2dvp0fYB--ChYPrA3po7HiKaSl17F0HTbLLZDF1Axxjgu3ZGWtgN7oZZxcD1MsWXOjBBTjMsRtwmhfuOY7ownRDro7oJ3v7nyvSbF6a9WtW77e79VOdoVQqY1ZSVEZiqRkVldG8ZEYJVciyKpiWRphOG8aoRFEpxqwRlJfSKnzneuH4itz_3V6E21NyI6bv9le8vYjzHwFbUnk</recordid><startdate>20220927</startdate><enddate>20220927</enddate><creator>Aloor, Jasmine Jerry</creator><creator>Patrikar, Jay</creator><creator>Kapoor, Parv</creator><creator>Oh, Jean</creator><creator>Scherer, Sebastian</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220927</creationdate><title>Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains</title><author>Aloor, Jasmine Jerry ; Patrikar, Jay ; Kapoor, Parv ; Oh, Jean ; Scherer, Sebastian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-1e60a7d6a481059d8341d75726492186d5dc8d1106a59711ed50346e7ab387d63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Aloor, Jasmine Jerry</creatorcontrib><creatorcontrib>Patrikar, Jay</creatorcontrib><creatorcontrib>Kapoor, Parv</creatorcontrib><creatorcontrib>Oh, Jean</creatorcontrib><creatorcontrib>Scherer, Sebastian</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Aloor, Jasmine Jerry</au><au>Patrikar, Jay</au><au>Kapoor, Parv</au><au>Oh, Jean</au><au>Scherer, Sebastian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains</atitle><date>2022-09-27</date><risdate>2022</risdate><abstract>Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies is a critical requirement to enable the real-world deployment of AI agents. Recently, Signal Temporal Logic (STL) has been shown to be an effective language for encoding rules as spatio-temporal constraints. This work uses Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into a vanilla LfD policy to improve constraint satisfaction. We propose augmenting the MCTS heuristic with STL robustness values to bias the tree search towards branches with higher constraint satisfaction. While the domain-independent method can be applied to integrate STL rules online into any pre-trained LfD algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning as the offline LfD policy. We apply the proposed method to the domain of planning trajectories for General Aviation aircraft around a non-towered airfield. Results using the simulator trained on real-world data showcase 60% improved performance over baseline LfD methods that do not use STL heuristics.</abstract><doi>10.48550/arxiv.2209.13737</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2209.13737
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2209_13737
source	arXiv.org
subjects	Computer Science - Robotics
title	Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T03%3A04%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Follow%20The%20Rules:%20Online%20Signal%20Temporal%20Logic%20Tree%20Search%20for%20Guided%20Imitation%20Learning%20in%20Stochastic%20Domains&rft.au=Aloor,%20Jasmine%20Jerry&rft.date=2022-09-27&rft_id=info:doi/10.48550/arxiv.2209.13737&rft_dat=%3Carxiv_GOX%3E2209_13737%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true