Reinforcement Learning and Dynamical Systems

This thesis concerns reinforcement learning and dynamical systems in finite discrete problem domains. Artificial intelligence studies through reinforcement learning involves developing models and algorithms for scenarios when there is an agent that is interacting with an environment. By taking actio...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
1. Verfasser:	Lindenberg, Björn
Format:	Dissertation
Sprache:	eng
Schlagworte:	artificial intelligence Bayesian bandits Bellman operators Computer Science conjugate priors cycle orbits Datavetenskap deep learning distributional reinforcement learning fixed-point systems linear finite dynamical systems Markov decision processes Matematik Mathematics multi-armed bandits Thompson sampling
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This thesis concerns reinforcement learning and dynamical systems in finite discrete problem domains. Artificial intelligence studies through reinforcement learning involves developing models and algorithms for scenarios when there is an agent that is interacting with an environment. By taking actions the agent may induce changes in the observed environment, where a modeled reward system reinforces correct behavior through learning algorithms. Reinforcement learning can be used in a wide variety of different domains, such as finance, robotics, games, routing and health care. However as the subject matures there is an increasing need to more heavily rely on advanced concepts in mathematics and deep learning to further our understanding of existing problems and find new algorithmic insights. Discrete dynamical systems arise in virtually any setting as soon as there is a set of elements subjected to iteration by a defining function. The function may be seen to represent the passing of time or to define the rules for state transitions. If the set of elements is finite but very large then we may find applications in several different fields such as operations research, cryptography and biology, where understanding properties of the structure and long-term behavior without explicit enumeration is key. In Paper I we extend the model of categorical reinforcement learning with a group-aided training procedure involving multiple agents. By having the agents learn through shared distributional information but act independently we argue for an accelerated learning process. We empirically show that the procedure may lead to much more robust learning, stronger individual agent performance and good ensemble efficiency. In Paper II we continue to build upon distributional reinforcement learning for finite Markov processes. A common approach among algorithms is to apply transformations on agent returns for stability and flexibility over a variety of different tasks. We show that one of the most successful methods may not work for a stochastic process. As a solution we introduce a new distributional operator that handles a large class of transformations with guaranteed theoretical convergence. We also propose an approximating single-actor algorithm based on these novel insights, which when tested achieves state-of-the-art performance compared to similar algorithms. In Paper III we focus on the issue of efficient exploration in reinforcement learning by studying the regret
DOI:	10.15626/LUD.494.2023