The ODE Method for Stochastic Approximation and Reinforcement Learning with Markovian Noise
Stochastic approximation is a class of algorithms that update a vector iteratively, incrementally, and stochastically, including, e.g., stochastic gradient descent and temporal difference learning. One fundamental challenge in analyzing a stochastic approximation algorithm is to establish its stabil...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Stochastic approximation is a class of algorithms that update a vector
iteratively, incrementally, and stochastically, including, e.g., stochastic
gradient descent and temporal difference learning. One fundamental challenge in
analyzing a stochastic approximation algorithm is to establish its stability,
i.e., to show that the stochastic vector iterates are bounded almost surely. In
this paper, we extend the celebrated Borkar-Meyn theorem for stability from the
Martingale difference noise setting to the Markovian noise setting, which
greatly improves its applicability in reinforcement learning, especially in
those off-policy reinforcement learning algorithms with linear function
approximation and eligibility traces. Central to our analysis is the
diminishing asymptotic rate of change of a few functions, which is implied by
both a form of strong law of large numbers and a commonly used V4 Lyapunov
drift condition and trivially holds if the Markov chain is finite and
irreducible. |
---|---|
DOI: | 10.48550/arxiv.2401.07844 |