The Projected Bellman Equation in Reinforcement Learning
Q-learning has become an important part of the reinforcement learning toolkit since its introduction in the dissertation of Chris Watkins in the 1980s. In the original tabular formulation, the goal is to compute exactly a solution to the discounted-cost optimality equation, and thereby, obtain the o...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on automatic control 2024-12, Vol.69 (12), p.8323-8337 |
---|---|
1. Verfasser: | |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Q-learning has become an important part of the reinforcement learning toolkit since its introduction in the dissertation of Chris Watkins in the 1980s. In the original tabular formulation, the goal is to compute exactly a solution to the discounted-cost optimality equation, and thereby, obtain the optimal policy for a Markov Decision Process. The goal today is more modest: obtain an approximate solution within a prescribed function class. The standard algorithms are based on the same architecture as formulated in the 1980s, with the goal of finding a value function approximation that solves the so-called projected Bellman equation. While reinforcement learning has been an active research area for over four decades, there is little theory providing conditions for convergence of these Q-learning algorithms, or even existence of a solution to this equation. The purpose of this article is to show that a solution to the projected Bellman equation does exist, provided the function class is linear and the input used for training is a form of \varepsilon-greedy policy with sufficiently small \varepsilon. Moreover, under these conditions it is shown that the Q-learning algorithm is stable, in terms of bounded parameter estimates. Convergence remains one of many open topics for research. |
---|---|
ISSN: | 0018-9286 1558-2523 |
DOI: | 10.1109/TAC.2024.3409647 |