Finite-Time Analysis of Simultaneous Double Q-learning
$Q$-learning is one of the most fundamental reinforcement learning (RL) algorithms. Despite its widespread success in various applications, it is prone to overestimation bias in the $Q$-learning update. To address this issue, double $Q$-learning employs two independent $Q$-estimators which are rando...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | $Q$-learning is one of the most fundamental reinforcement learning (RL)
algorithms. Despite its widespread success in various applications, it is prone
to overestimation bias in the $Q$-learning update. To address this issue,
double $Q$-learning employs two independent $Q$-estimators which are randomly
selected and updated during the learning process. This paper proposes a
modified double $Q$-learning, called simultaneous double $Q$-learning (SDQ),
with its finite-time analysis. SDQ eliminates the need for random selection
between the two $Q$-estimators, and this modification allows us to analyze
double $Q$-learning through the lens of a novel switching system framework
facilitating efficient finite-time analysis. Empirical studies demonstrate that
SDQ converges faster than double $Q$-learning while retaining the ability to
mitigate the maximization bias. Finally, we derive a finite-time expected error
bound for SDQ. |
---|---|
DOI: | 10.48550/arxiv.2406.09946 |