Optimal dynamic fixed-mix portfolios based on reinforcement learning with second order stochastic dominance

We propose a reinforcement learning (RL) approach to address a multiperiod optimization problem in which a portfolio manager seeks an optimal constant proportion portfolio strategy by minimizing a tail risk measure consistent with second order stochastic dominance (SSD) principles. As a risk measure...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Engineering applications of artificial intelligence 2024-07, Vol.133, p.108599, Article 108599
Hauptverfasser: Consigli, Giorgio, Gomez, Alvaro A., Zubelli, Jorge P.
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We propose a reinforcement learning (RL) approach to address a multiperiod optimization problem in which a portfolio manager seeks an optimal constant proportion portfolio strategy by minimizing a tail risk measure consistent with second order stochastic dominance (SSD) principles. As a risk measure, we consider in particular the Interval Conditional Value-at-Risk (ICVaR) shown to be mathematically related to SSD principles. By including the ICVaR in the reward function of an RL method we show that an optimal fixed-mix policy can be derived as solution of short- to medium-term allocation problems through an accurate specification of the learning parameters under general statistical assumptions. The financial optimization problem, thus, carries several novel features and the article details the required steps to accommodate those features within a reinforcement learning architecture. The methodology is tested in- and out-of-sample on market data showing good performance relative to the SP500, adopted as benchmark policy. [Display omitted] •Solution of a dynamic mean-risk optimization problem based on a constant proportion constraint and second-order stochastic dominance.•Adoption of a deep reinforcement learning method, whose convergence properties and algorithmic design are discussed.•In- and out-of-sample validation of the model and methodology on market data over several years.
ISSN:0952-1976
1873-6769
DOI:10.1016/j.engappai.2024.108599