Value function estimators for Feynman–Kac forward–backward SDEs in stochastic optimal control

Two novel numerical estimators are proposed for solving forward–backward stochastic differential equations (FBSDEs) appearing in the Feynman–Kac representation of the value function in stochastic optimal control problems. In contrast to the current numerical approaches, which are based on the discre...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Automatica (Oxford) 2023-12, Vol.158, p.111281, Article 111281
Hauptverfasser:	Hawkins, Kelsey P., Pakniyat, Ali, Tsiotras, Panagiotis
Format:	Artikel
Sprache:	eng
Schlagworte:	Generalized solutions of Hamilton–Jacobi equations Monte Carlo methods Non-linear control systems Parametric optimization Stochastic control and game theory Stochastic optimal control problems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Two novel numerical estimators are proposed for solving forward–backward stochastic differential equations (FBSDEs) appearing in the Feynman–Kac representation of the value function in stochastic optimal control problems. In contrast to the current numerical approaches, which are based on the discretization of the continuous-time FBSDE, we propose a converse approach, namely, we obtain a discrete-time approximation of the value function, and then we derive a discrete-time estimator that resembles the continuous-time counterpart. The proposed approach allows for the construction of higher accuracy estimators along with an error analysis. The approach is applied to the policy improvement step in a reinforcement learning framework. Numerical results, along with the corresponding error analysis, demonstrate that the proposed estimators show significant improvement in terms of accuracy over classical Euler–Maruyama-based estimators. In the case of LQ problems, we demonstrate that our estimators result in near machine-precision level accuracy, in contrast to previously proposed methods that can potentially diverge on the same problems.
ISSN:	0005-1098 1873-2836
DOI:	10.1016/j.automatica.2023.111281