The configurable tree graph (CT-graph): measurable problems in partially observable and distal reward environments for lifelong reinforcement learning
This paper introduces a set of formally defined and transparent problems for reinforcement learning algorithms with the following characteristics: (1) variable degrees of observability (non-Markov observations), (2) distal and sparse rewards, (3) variable and hierarchical reward structure, (4) multi...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper introduces a set of formally defined and transparent problems for
reinforcement learning algorithms with the following characteristics: (1)
variable degrees of observability (non-Markov observations), (2) distal and
sparse rewards, (3) variable and hierarchical reward structure, (4)
multiple-task generation, (5) variable problem complexity. The environment
provides 1D or 2D categorical observations, and takes actions as input. The
core structure of the CT-graph is a multi-branch tree graph with arbitrary
branching factor, depth, and observation sets that can be varied to increase
the dimensions of the problem in a controllable and measurable way. Two main
categories of states, decision states and wait states, are devised to create a
hierarchy of importance among observations, typical of real-world problems. A
large observation set can produce a vast set of histories that impairs
memory-augmented agents. Variable reward functions allow for the easy creation
of multiple tasks and the ability of an agent to efficiently adapt in dynamic
scenarios where tasks with controllable degrees of similarities are presented.
Challenging complexity levels can be easily achieved due to the exponential
growth of the graph. The problem formulation and accompanying code provide a
fast, transparent, and mathematically defined set of configurable tests to
compare the performance of reinforcement learning algorithms, in particular in
lifelong learning settings. |
---|---|
DOI: | 10.48550/arxiv.2302.10887 |