Regular Decision Processes for Grid Worlds

Markov decision processes are typically used for sequential decision making under uncertainty. For many aspects however, ranging from constrained or safe specifications to various kinds of temporal (non-Markovian) dependencies in task and reward structures, extensions are needed. To that end, in rec...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Lenaers, Nicky, van Otterlo, Martijn
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Lenaers, Nicky van Otterlo, Martijn
description	Markov decision processes are typically used for sequential decision making under uncertainty. For many aspects however, ranging from constrained or safe specifications to various kinds of temporal (non-Markovian) dependencies in task and reward structures, extensions are needed. To that end, in recent years interest has grown into combinations of reinforcement learning and temporal logic, that is, combinations of flexible behavior learning methods with robust verification and guarantees. In this paper we describe an experimental investigation of the recently introduced regular decision processes that support both non-Markovian reward functions as well as transition functions. In particular, we provide a tool chain for regular decision processes, algorithmic extensions relating to online, incremental learning, an empirical evaluation of model-free and model-based solution algorithms, and applications in regular, but non-Markovian, grid worlds.
doi_str_mv	10.48550/arxiv.2111.03647
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2111_03647</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2111_03647</sourcerecordid><originalsourceid>FETCH-LOGICAL-a677-95b156e3c765e37abd7f23fd359dee71c103ca299e6d48570b863460b6c14e323</originalsourceid><addsrcrecordid>eNotzr0KwjAUQOEsDqI-gJOZhdakt0nsKFWrICgiOJY0uZVAtZKg6Nv7O53t8BEy5CxOp0KwifYPd48TznnMQKaqS8Z7PN0a7ekcjQuuvdCdbw2GgIHWraeFd5YeW9_Y0CedWjcBB__2yGG5OOSraLMt1vlsE2mpVJSJiguJYJQUCEpXVtUJ1BZEZhEVN5yB0UmWobRvlGLVVEIqWSUNTxES6JHRb_vFllfvzto_yw-6_KLhBTc9OvI</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Regular Decision Processes for Grid Worlds</title><source>arXiv.org</source><creator>Lenaers, Nicky ; van Otterlo, Martijn</creator><creatorcontrib>Lenaers, Nicky ; van Otterlo, Martijn</creatorcontrib><description>Markov decision processes are typically used for sequential decision making under uncertainty. For many aspects however, ranging from constrained or safe specifications to various kinds of temporal (non-Markovian) dependencies in task and reward structures, extensions are needed. To that end, in recent years interest has grown into combinations of reinforcement learning and temporal logic, that is, combinations of flexible behavior learning methods with robust verification and guarantees. In this paper we describe an experimental investigation of the recently introduced regular decision processes that support both non-Markovian reward functions as well as transition functions. In particular, we provide a tool chain for regular decision processes, algorithmic extensions relating to online, incremental learning, an empirical evaluation of model-free and model-based solution algorithms, and applications in regular, but non-Markovian, grid worlds.</description><identifier>DOI: 10.48550/arxiv.2111.03647</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence</subject><creationdate>2021-11</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,777,882</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2111.03647$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2111.03647$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Lenaers, Nicky</creatorcontrib><creatorcontrib>van Otterlo, Martijn</creatorcontrib><title>Regular Decision Processes for Grid Worlds</title><description>Markov decision processes are typically used for sequential decision making under uncertainty. For many aspects however, ranging from constrained or safe specifications to various kinds of temporal (non-Markovian) dependencies in task and reward structures, extensions are needed. To that end, in recent years interest has grown into combinations of reinforcement learning and temporal logic, that is, combinations of flexible behavior learning methods with robust verification and guarantees. In this paper we describe an experimental investigation of the recently introduced regular decision processes that support both non-Markovian reward functions as well as transition functions. In particular, we provide a tool chain for regular decision processes, algorithmic extensions relating to online, incremental learning, an empirical evaluation of model-free and model-based solution algorithms, and applications in regular, but non-Markovian, grid worlds.</description><subject>Computer Science - Artificial Intelligence</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotzr0KwjAUQOEsDqI-gJOZhdakt0nsKFWrICgiOJY0uZVAtZKg6Nv7O53t8BEy5CxOp0KwifYPd48TznnMQKaqS8Z7PN0a7ekcjQuuvdCdbw2GgIHWraeFd5YeW9_Y0CedWjcBB__2yGG5OOSraLMt1vlsE2mpVJSJiguJYJQUCEpXVtUJ1BZEZhEVN5yB0UmWobRvlGLVVEIqWSUNTxES6JHRb_vFllfvzto_yw-6_KLhBTc9OvI</recordid><startdate>20211105</startdate><enddate>20211105</enddate><creator>Lenaers, Nicky</creator><creator>van Otterlo, Martijn</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20211105</creationdate><title>Regular Decision Processes for Grid Worlds</title><author>Lenaers, Nicky ; van Otterlo, Martijn</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a677-95b156e3c765e37abd7f23fd359dee71c103ca299e6d48570b863460b6c14e323</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Artificial Intelligence</topic><toplevel>online_resources</toplevel><creatorcontrib>Lenaers, Nicky</creatorcontrib><creatorcontrib>van Otterlo, Martijn</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Lenaers, Nicky</au><au>van Otterlo, Martijn</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Regular Decision Processes for Grid Worlds</atitle><date>2021-11-05</date><risdate>2021</risdate><abstract>Markov decision processes are typically used for sequential decision making under uncertainty. For many aspects however, ranging from constrained or safe specifications to various kinds of temporal (non-Markovian) dependencies in task and reward structures, extensions are needed. To that end, in recent years interest has grown into combinations of reinforcement learning and temporal logic, that is, combinations of flexible behavior learning methods with robust verification and guarantees. In this paper we describe an experimental investigation of the recently introduced regular decision processes that support both non-Markovian reward functions as well as transition functions. In particular, we provide a tool chain for regular decision processes, algorithmic extensions relating to online, incremental learning, an empirical evaluation of model-free and model-based solution algorithms, and applications in regular, but non-Markovian, grid worlds.</abstract><doi>10.48550/arxiv.2111.03647</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2111.03647
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2111_03647
source	arXiv.org
subjects	Computer Science - Artificial Intelligence
title	Regular Decision Processes for Grid Worlds
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-18T12%3A39%3A51IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Regular%20Decision%20Processes%20for%20Grid%20Worlds&rft.au=Lenaers,%20Nicky&rft.date=2021-11-05&rft_id=info:doi/10.48550/arxiv.2111.03647&rft_dat=%3Carxiv_GOX%3E2111_03647%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true