PID Accelerated Temporal Difference Algorithms
Long-horizon tasks, which have a large discount factor, pose a challenge for most conventional reinforcement learning (RL) algorithms. Algorithms such as Value Iteration and Temporal Difference (TD) learning have a slow convergence rate and become inefficient in these tasks. When the transition dist...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Long-horizon tasks, which have a large discount factor, pose a challenge for
most conventional reinforcement learning (RL) algorithms. Algorithms such as
Value Iteration and Temporal Difference (TD) learning have a slow convergence
rate and become inefficient in these tasks. When the transition distributions
are given, PID VI was recently introduced to accelerate the convergence of
Value Iteration using ideas from control theory. Inspired by this, we introduce
PID TD Learning and PID Q-Learning algorithms for the RL setting, in which only
samples from the environment are available. We give a theoretical analysis of
the convergence of PID TD Learning and its acceleration compared to the
conventional TD Learning. We also introduce a method for adapting PID gains in
the presence of noise and empirically verify its effectiveness. |
---|---|
DOI: | 10.48550/arxiv.2407.08803 |