Finite Sample Analysis of Minmax Variant of Offline Reinforcement Learning for General MDPs

In this work, we analyze the finite sample complexity bounds for offline reinforcement learning with general state, general function space and state-dependent action sets. The algorithm analyzed does not require the knowledge of the data-collection policy as compared to earlier works. We show that o...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE Open Journal of Control Systems 2022, Vol.1, p.152-163
Hauptverfasser: Regatti, Jayanth Reddy, Gupta, Abhishek
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this work, we analyze the finite sample complexity bounds for offline reinforcement learning with general state, general function space and state-dependent action sets. The algorithm analyzed does not require the knowledge of the data-collection policy as compared to earlier works. We show that one can compute an \epsilon-optimal Q function (state-action value function) using O(1/\epsilon ^{4}) i.i.d. samples of state-action-reward-next state tuples.
ISSN:2694-085X
2694-085X
DOI:10.1109/OJCSYS.2022.3198660