Deep estimation for Q⁎ with minimax Bellman error minimization

In this paper we consider the estimation of optimal state-action value function Q⁎ with ReLU ResNet based on minimax Bellman error minimization. We construct the non-asymptotic error bounds for the minimax estimator and the estimated Q function induced by the estimated greedy policy. To bound the Be...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Information sciences 2023-11, Vol.648, p.119565, Article 119565
Hauptverfasser: Kang, Lican, Liao, Xu, Liu, Jin, Luo, Yuan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper we consider the estimation of optimal state-action value function Q⁎ with ReLU ResNet based on minimax Bellman error minimization. We construct the non-asymptotic error bounds for the minimax estimator and the estimated Q function induced by the estimated greedy policy. To bound the Bellman residual error, we guarantee the approximation errors based on deep approximation theory and the statistical ones by utilizing empirical processes taking into account the Markov decision process dependency. We provide a novel generalization bound with dependent data and an approximation bound in the Hölder class which are of independent interest. This bound depends on the sample size, the ambient dimension, the width and depth of the neural network, which can bring prior insights into tuning these hyper-parameters to achieve a desired convergence rate in practice. Furthermore, the bound circumvents the curse of dimensionality if the distribution of state-action pairs is assumed to be supported on a set of low intrinsic dimension.
ISSN:0020-0255
1872-6291
DOI:10.1016/j.ins.2023.119565