The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learning

This paper introduces a set of formally defined and transparent problems for reinforcement learning algorithms with the following characteristics: (1) variable degrees of observability (non-Markov observations), (2) distal and sparse rewards, (3) variable and hierarchical reward structure, (4) multi...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2023-01
Hauptverfasser: Soltoggio, Andrea, Ben-Iwhiwhu, Eseoghene, Peridis, Christos, Ladosz, Pawel, Dick, Jeffery, Pilly, Praveen K, Kolouri, Soheil
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Soltoggio, Andrea
Ben-Iwhiwhu, Eseoghene
Peridis, Christos
Ladosz, Pawel
Dick, Jeffery
Pilly, Praveen K
Kolouri, Soheil
description This paper introduces a set of formally defined and transparent problems for reinforcement learning algorithms with the following characteristics: (1) variable degrees of observability (non-Markov observations), (2) distal and sparse rewards, (3) variable and hierarchical reward structure, (4) multiple-task generation, (5) variable problem complexity. The environment provides 1D or 2D categorical observations, and takes actions as input. The core structure of the CT-graph is a multi-branch tree graph with arbitrary branching factor, depth, and observation sets that can be varied to increase the dimensions of the problem in a controllable and measurable way. Two main categories of states, decision states and wait states, are devised to create a hierarchy of importance among observations, typical of real-world problems. A large observation set can produce a vast set of histories that impairs memory-augmented agents. Variable reward functions allow for the easy creation of multiple tasks and the ability of an agent to efficiently adapt in dynamic scenarios where tasks with controllable degrees of similarities are presented. Challenging complexity levels can be easily achieved due to the exponential growth of the graph. The problem formulation and accompanying code provide a fast, transparent, and mathematically defined set of configurable tests to compare the performance of reinforcement learning algorithms, in particular in lifelong learning settings.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2779274390</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2779274390</sourcerecordid><originalsourceid>FETCH-proquest_journals_27792743903</originalsourceid><addsrcrecordid>eNqNjUFqAkEQRRshEEm8Q4EbsxiYdGtG3UpCDjB7KZ2asaWneqzqMeQiOW864gGy-p__HvyJmVrnXov10tpHM1M9l2Vp3yq7Wrmp-alPBMfIre9GwUMgSEIEneBwgsWuLm7tZQs9od6NQWKOXsEzDCjJYwjfEA9Kcr0JyA00XhMGEPpCaYD46iVyT5wU2igQfEshcpcFz3k40h-DQCjsuXs2Dy0Gpdk9n8z8473efRb5-zKSpv05jsIZ7W1VbWy1dJvS_c_6BVIZWeA</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2779274390</pqid></control><display><type>article</type><title>The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learning</title><source>Free E- Journals</source><creator>Soltoggio, Andrea ; Ben-Iwhiwhu, Eseoghene ; Peridis, Christos ; Ladosz, Pawel ; Dick, Jeffery ; Pilly, Praveen K ; Kolouri, Soheil</creator><creatorcontrib>Soltoggio, Andrea ; Ben-Iwhiwhu, Eseoghene ; Peridis, Christos ; Ladosz, Pawel ; Dick, Jeffery ; Pilly, Praveen K ; Kolouri, Soheil</creatorcontrib><description>This paper introduces a set of formally defined and transparent problems for reinforcement learning algorithms with the following characteristics: (1) variable degrees of observability (non-Markov observations), (2) distal and sparse rewards, (3) variable and hierarchical reward structure, (4) multiple-task generation, (5) variable problem complexity. The environment provides 1D or 2D categorical observations, and takes actions as input. The core structure of the CT-graph is a multi-branch tree graph with arbitrary branching factor, depth, and observation sets that can be varied to increase the dimensions of the problem in a controllable and measurable way. Two main categories of states, decision states and wait states, are devised to create a hierarchy of importance among observations, typical of real-world problems. A large observation set can produce a vast set of histories that impairs memory-augmented agents. Variable reward functions allow for the easy creation of multiple tasks and the ability of an agent to efficiently adapt in dynamic scenarios where tasks with controllable degrees of similarities are presented. Challenging complexity levels can be easily achieved due to the exponential growth of the graph. The problem formulation and accompanying code provide a fast, transparent, and mathematically defined set of configurable tests to compare the performance of reinforcement learning algorithms, in particular in lifelong learning settings.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Algorithms ; Controllability ; Lifelong learning ; Machine learning ; Mathematical analysis ; Task complexity</subject><ispartof>arXiv.org, 2023-01</ispartof><rights>2023. This work is published under http://creativecommons.org/licenses/by/4.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>781,785</link.rule.ids></links><search><creatorcontrib>Soltoggio, Andrea</creatorcontrib><creatorcontrib>Ben-Iwhiwhu, Eseoghene</creatorcontrib><creatorcontrib>Peridis, Christos</creatorcontrib><creatorcontrib>Ladosz, Pawel</creatorcontrib><creatorcontrib>Dick, Jeffery</creatorcontrib><creatorcontrib>Pilly, Praveen K</creatorcontrib><creatorcontrib>Kolouri, Soheil</creatorcontrib><title>The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learning</title><title>arXiv.org</title><description>This paper introduces a set of formally defined and transparent problems for reinforcement learning algorithms with the following characteristics: (1) variable degrees of observability (non-Markov observations), (2) distal and sparse rewards, (3) variable and hierarchical reward structure, (4) multiple-task generation, (5) variable problem complexity. The environment provides 1D or 2D categorical observations, and takes actions as input. The core structure of the CT-graph is a multi-branch tree graph with arbitrary branching factor, depth, and observation sets that can be varied to increase the dimensions of the problem in a controllable and measurable way. Two main categories of states, decision states and wait states, are devised to create a hierarchy of importance among observations, typical of real-world problems. A large observation set can produce a vast set of histories that impairs memory-augmented agents. Variable reward functions allow for the easy creation of multiple tasks and the ability of an agent to efficiently adapt in dynamic scenarios where tasks with controllable degrees of similarities are presented. Challenging complexity levels can be easily achieved due to the exponential growth of the graph. The problem formulation and accompanying code provide a fast, transparent, and mathematically defined set of configurable tests to compare the performance of reinforcement learning algorithms, in particular in lifelong learning settings.</description><subject>Algorithms</subject><subject>Controllability</subject><subject>Lifelong learning</subject><subject>Machine learning</subject><subject>Mathematical analysis</subject><subject>Task complexity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNjUFqAkEQRRshEEm8Q4EbsxiYdGtG3UpCDjB7KZ2asaWneqzqMeQiOW864gGy-p__HvyJmVrnXov10tpHM1M9l2Vp3yq7Wrmp-alPBMfIre9GwUMgSEIEneBwgsWuLm7tZQs9od6NQWKOXsEzDCjJYwjfEA9Kcr0JyA00XhMGEPpCaYD46iVyT5wU2igQfEshcpcFz3k40h-DQCjsuXs2Dy0Gpdk9n8z8473efRb5-zKSpv05jsIZ7W1VbWy1dJvS_c_6BVIZWeA</recordid><startdate>20230121</startdate><enddate>20230121</enddate><creator>Soltoggio, Andrea</creator><creator>Ben-Iwhiwhu, Eseoghene</creator><creator>Peridis, Christos</creator><creator>Ladosz, Pawel</creator><creator>Dick, Jeffery</creator><creator>Pilly, Praveen K</creator><creator>Kolouri, Soheil</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20230121</creationdate><title>The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learning</title><author>Soltoggio, Andrea ; Ben-Iwhiwhu, Eseoghene ; Peridis, Christos ; Ladosz, Pawel ; Dick, Jeffery ; Pilly, Praveen K ; Kolouri, Soheil</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_27792743903</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Algorithms</topic><topic>Controllability</topic><topic>Lifelong learning</topic><topic>Machine learning</topic><topic>Mathematical analysis</topic><topic>Task complexity</topic><toplevel>online_resources</toplevel><creatorcontrib>Soltoggio, Andrea</creatorcontrib><creatorcontrib>Ben-Iwhiwhu, Eseoghene</creatorcontrib><creatorcontrib>Peridis, Christos</creatorcontrib><creatorcontrib>Ladosz, Pawel</creatorcontrib><creatorcontrib>Dick, Jeffery</creatorcontrib><creatorcontrib>Pilly, Praveen K</creatorcontrib><creatorcontrib>Kolouri, Soheil</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Soltoggio, Andrea</au><au>Ben-Iwhiwhu, Eseoghene</au><au>Peridis, Christos</au><au>Ladosz, Pawel</au><au>Dick, Jeffery</au><au>Pilly, Praveen K</au><au>Kolouri, Soheil</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learning</atitle><jtitle>arXiv.org</jtitle><date>2023-01-21</date><risdate>2023</risdate><eissn>2331-8422</eissn><abstract>This paper introduces a set of formally defined and transparent problems for reinforcement learning algorithms with the following characteristics: (1) variable degrees of observability (non-Markov observations), (2) distal and sparse rewards, (3) variable and hierarchical reward structure, (4) multiple-task generation, (5) variable problem complexity. The environment provides 1D or 2D categorical observations, and takes actions as input. The core structure of the CT-graph is a multi-branch tree graph with arbitrary branching factor, depth, and observation sets that can be varied to increase the dimensions of the problem in a controllable and measurable way. Two main categories of states, decision states and wait states, are devised to create a hierarchy of importance among observations, typical of real-world problems. A large observation set can produce a vast set of histories that impairs memory-augmented agents. Variable reward functions allow for the easy creation of multiple tasks and the ability of an agent to efficiently adapt in dynamic scenarios where tasks with controllable degrees of similarities are presented. Challenging complexity levels can be easily achieved due to the exponential growth of the graph. The problem formulation and accompanying code provide a fast, transparent, and mathematically defined set of configurable tests to compare the performance of reinforcement learning algorithms, in particular in lifelong learning settings.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2023-01
issn 2331-8422
language eng
recordid cdi_proquest_journals_2779274390
source Free E- Journals
subjects Algorithms
Controllability
Lifelong learning
Machine learning
Mathematical analysis
Task complexity
title The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learning
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-17T16%3A01%3A04IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=The%20configurable%20tree%20graph%20(CT-graph):%20measurable%20problems%20in%20partially%20observable%20and%20distal%20reward%20environments%20for%20lifelong%20reinforcement%20learning&rft.jtitle=arXiv.org&rft.au=Soltoggio,%20Andrea&rft.date=2023-01-21&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2779274390%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2779274390&rft_id=info:pmid/&rfr_iscdi=true