Modeling Bellman-error with logistic distribution with applications in reinforcement learning

In modern Reinforcement Learning (RL) approaches, optimizing the Bellman error is a critical element across various algorithms, notably in deep Q-Learning and related methodologies. Traditional approaches predominantly employ the mean-squared Bellman error (MSELoss) as the standard loss function. Ho...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Neural networks 2024-09, Vol.177, p.106387, Article 106387
Hauptverfasser:	Lv, Outongyi, Zhou, Bingxin, Yang, Lin F.
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Bellman error Humans Logistic distribution Logistic Models Machine Learning Neural Networks, Computer Reinforcement learning Reinforcement, Psychology Reward Reward scaling
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In modern Reinforcement Learning (RL) approaches, optimizing the Bellman error is a critical element across various algorithms, notably in deep Q-Learning and related methodologies. Traditional approaches predominantly employ the mean-squared Bellman error (MSELoss) as the standard loss function. However, the assumption of Bellman errors following the Gaussian distribution may oversimplify the nuanced characteristics of RL applications. In this work, we revisit the distribution of Bellman error in RL training, demonstrating that it tends to follow the Logistic distribution rather than the commonly assumed Normal distribution. We propose replacing MSELoss with a Logistic maximum likelihood function (LLoss) and rigorously test this hypothesis through extensive numerical experiments across diverse online and offline RL environments. Our findings consistently show that integrating the Logistic correction into the loss functions of various baseline RL methods leads to superior performance compared to their MSE counterparts. Additionally, we employ Kolmogorov–Smirnov tests to substantiate that the Logistic distribution offers a more accurate fit for approximating Bellman errors. This study also offers a novel theoretical contribution by establishing a clear connection between the distribution of Bellman error and the practice of proportional reward scaling, a common technique for performance enhancement in RL. Moreover, we explore the sample-accuracy trade-off involved in approximating the Logistic distribution, leveraging the Bias–Variance decomposition to mitigate excessive computational resources. The theoretical and empirical insights presented in this study lay a significant foundation for future research, potentially advancing methodologies, and understanding in RL, particularly in the distribution-based optimization of Bellman error. •We challenge the belief in a Normally distributed Bellman error with a Logistic distribution.•We explore Logistic distribution sampling error using Bias-Variance decomposition for optimal batch size.•We confirm the Logistic distribution’s robustness for Bellman error with extensive testing and Kolmogorov–Smirnov tests.Novelty: We provide the first rigorous Logistic distribution modeling scheme for modeling the distribution of Bellman error and relate it to the reward scaling problem.
ISSN:	0893-6080 1879-2782 1879-2782
DOI:	10.1016/j.neunet.2024.106387