Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains
Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies is a critical requirement to enable the real-world deployment of AI agents. Recently, Signal Temporal Logic (STL) has been shown to be an effective language for encoding rules as spatio-temporal constraints. This work uses M...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Aloor, Jasmine Jerry Patrikar, Jay Kapoor, Parv Oh, Jean Scherer, Sebastian |
description | Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies
is a critical requirement to enable the real-world deployment of AI agents.
Recently, Signal Temporal Logic (STL) has been shown to be an effective
language for encoding rules as spatio-temporal constraints. This work uses
Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into
a vanilla LfD policy to improve constraint satisfaction. We propose augmenting
the MCTS heuristic with STL robustness values to bias the tree search towards
branches with higher constraint satisfaction. While the domain-independent
method can be applied to integrate STL rules online into any pre-trained LfD
algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning
as the offline LfD policy. We apply the proposed method to the domain of
planning trajectories for General Aviation aircraft around a non-towered
airfield. Results using the simulator trained on real-world data showcase 60%
improved performance over baseline LfD methods that do not use STL heuristics. |
doi_str_mv | 10.48550/arxiv.2209.13737 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2209_13737</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2209_13737</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-1e60a7d6a481059d8341d75726492186d5dc8d1106a59711ed50346e7ab387d63</originalsourceid><addsrcrecordid>eNotj8tOwzAURL1hgQofwIr7Awl2HD_CDhVaKkWqRLOPLrGbWHLsykl5_D2hsJqRRmekQ8gdo3mphaAPmL7cR14UtMoZV1xdk7CJ3sdPaAYLb2dvp0fYB--ChYPrA3po7HiKaSl17F0HTbLLZDF1Axxjgu3ZGWtgN7oZZxcD1MsWXOjBBTjMsRtwmhfuOY7ownRDro7oJ3v7nyvSbF6a9WtW77e79VOdoVQqY1ZSVEZiqRkVldG8ZEYJVciyKpiWRphOG8aoRFEpxqwRlJfSKnzneuH4itz_3V6E21NyI6bv9le8vYjzHwFbUnk</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains</title><source>arXiv.org</source><creator>Aloor, Jasmine Jerry ; Patrikar, Jay ; Kapoor, Parv ; Oh, Jean ; Scherer, Sebastian</creator><creatorcontrib>Aloor, Jasmine Jerry ; Patrikar, Jay ; Kapoor, Parv ; Oh, Jean ; Scherer, Sebastian</creatorcontrib><description>Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies
is a critical requirement to enable the real-world deployment of AI agents.
Recently, Signal Temporal Logic (STL) has been shown to be an effective
language for encoding rules as spatio-temporal constraints. This work uses
Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into
a vanilla LfD policy to improve constraint satisfaction. We propose augmenting
the MCTS heuristic with STL robustness values to bias the tree search towards
branches with higher constraint satisfaction. While the domain-independent
method can be applied to integrate STL rules online into any pre-trained LfD
algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning
as the offline LfD policy. We apply the proposed method to the domain of
planning trajectories for General Aviation aircraft around a non-towered
airfield. Results using the simulator trained on real-world data showcase 60%
improved performance over baseline LfD methods that do not use STL heuristics.</description><identifier>DOI: 10.48550/arxiv.2209.13737</identifier><language>eng</language><subject>Computer Science - Robotics</subject><creationdate>2022-09</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2209.13737$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2209.13737$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Aloor, Jasmine Jerry</creatorcontrib><creatorcontrib>Patrikar, Jay</creatorcontrib><creatorcontrib>Kapoor, Parv</creatorcontrib><creatorcontrib>Oh, Jean</creatorcontrib><creatorcontrib>Scherer, Sebastian</creatorcontrib><title>Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains</title><description>Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies
is a critical requirement to enable the real-world deployment of AI agents.
Recently, Signal Temporal Logic (STL) has been shown to be an effective
language for encoding rules as spatio-temporal constraints. This work uses
Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into
a vanilla LfD policy to improve constraint satisfaction. We propose augmenting
the MCTS heuristic with STL robustness values to bias the tree search towards
branches with higher constraint satisfaction. While the domain-independent
method can be applied to integrate STL rules online into any pre-trained LfD
algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning
as the offline LfD policy. We apply the proposed method to the domain of
planning trajectories for General Aviation aircraft around a non-towered
airfield. Results using the simulator trained on real-world data showcase 60%
improved performance over baseline LfD methods that do not use STL heuristics.</description><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2022</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj8tOwzAURL1hgQofwIr7Awl2HD_CDhVaKkWqRLOPLrGbWHLsykl5_D2hsJqRRmekQ8gdo3mphaAPmL7cR14UtMoZV1xdk7CJ3sdPaAYLb2dvp0fYB--ChYPrA3po7HiKaSl17F0HTbLLZDF1Axxjgu3ZGWtgN7oZZxcD1MsWXOjBBTjMsRtwmhfuOY7ownRDro7oJ3v7nyvSbF6a9WtW77e79VOdoVQqY1ZSVEZiqRkVldG8ZEYJVciyKpiWRphOG8aoRFEpxqwRlJfSKnzneuH4itz_3V6E21NyI6bv9le8vYjzHwFbUnk</recordid><startdate>20220927</startdate><enddate>20220927</enddate><creator>Aloor, Jasmine Jerry</creator><creator>Patrikar, Jay</creator><creator>Kapoor, Parv</creator><creator>Oh, Jean</creator><creator>Scherer, Sebastian</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20220927</creationdate><title>Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains</title><author>Aloor, Jasmine Jerry ; Patrikar, Jay ; Kapoor, Parv ; Oh, Jean ; Scherer, Sebastian</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-1e60a7d6a481059d8341d75726492186d5dc8d1106a59711ed50346e7ab387d63</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2022</creationdate><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Aloor, Jasmine Jerry</creatorcontrib><creatorcontrib>Patrikar, Jay</creatorcontrib><creatorcontrib>Kapoor, Parv</creatorcontrib><creatorcontrib>Oh, Jean</creatorcontrib><creatorcontrib>Scherer, Sebastian</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Aloor, Jasmine Jerry</au><au>Patrikar, Jay</au><au>Kapoor, Parv</au><au>Oh, Jean</au><au>Scherer, Sebastian</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains</atitle><date>2022-09-27</date><risdate>2022</risdate><abstract>Seamlessly integrating rules in Learning-from-Demonstrations (LfD) policies
is a critical requirement to enable the real-world deployment of AI agents.
Recently, Signal Temporal Logic (STL) has been shown to be an effective
language for encoding rules as spatio-temporal constraints. This work uses
Monte Carlo Tree Search (MCTS) as a means of integrating STL specification into
a vanilla LfD policy to improve constraint satisfaction. We propose augmenting
the MCTS heuristic with STL robustness values to bias the tree search towards
branches with higher constraint satisfaction. While the domain-independent
method can be applied to integrate STL rules online into any pre-trained LfD
algorithm, we choose goal-conditioned Generative Adversarial Imitation Learning
as the offline LfD policy. We apply the proposed method to the domain of
planning trajectories for General Aviation aircraft around a non-towered
airfield. Results using the simulator trained on real-world data showcase 60%
improved performance over baseline LfD methods that do not use STL heuristics.</abstract><doi>10.48550/arxiv.2209.13737</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2209.13737 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2209_13737 |
source | arXiv.org |
subjects | Computer Science - Robotics |
title | Follow The Rules: Online Signal Temporal Logic Tree Search for Guided Imitation Learning in Stochastic Domains |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-29T03%3A04%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Follow%20The%20Rules:%20Online%20Signal%20Temporal%20Logic%20Tree%20Search%20for%20Guided%20Imitation%20Learning%20in%20Stochastic%20Domains&rft.au=Aloor,%20Jasmine%20Jerry&rft.date=2022-09-27&rft_id=info:doi/10.48550/arxiv.2209.13737&rft_dat=%3Carxiv_GOX%3E2209_13737%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |