Non-stationary Reinforcement Learning under General Function Approximation
General function approximation is a powerful tool to handle large state and action spaces in a broad range of reinforcement learning (RL) scenarios. However, theoretical understanding of non-stationary MDPs with general function approximation is still limited. In this paper, we make the first such a...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | General function approximation is a powerful tool to handle large state and
action spaces in a broad range of reinforcement learning (RL) scenarios.
However, theoretical understanding of non-stationary MDPs with general function
approximation is still limited. In this paper, we make the first such an
attempt. We first propose a new complexity metric called dynamic Bellman Eluder
(DBE) dimension for non-stationary MDPs, which subsumes majority of existing
tractable RL problems in static MDPs as well as non-stationary MDPs. Based on
the proposed complexity metric, we propose a novel confidence-set based
model-free algorithm called SW-OPEA, which features a sliding window mechanism
and a new confidence set design for non-stationary MDPs. We then establish an
upper bound on the dynamic regret for the proposed algorithm, and show that
SW-OPEA is provably efficient as long as the variation budget is not
significantly large. We further demonstrate via examples of non-stationary
linear and tabular MDPs that our algorithm performs better in small variation
budget scenario than the existing UCB-type algorithms. To the best of our
knowledge, this is the first dynamic regret analysis in non-stationary MDPs
with general function approximation. |
---|---|
DOI: | 10.48550/arxiv.2306.00861 |