An Optimal Tightness Bound for the Simulation Lemma
We present a bound for value-prediction error with respect to model misspecification that is tight, including constant factors. This is a direct improvement of the "simulation lemma," a foundational result in reinforcement learning. We demonstrate that existing bounds are quite loose, beco...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present a bound for value-prediction error with respect to model
misspecification that is tight, including constant factors. This is a direct
improvement of the "simulation lemma," a foundational result in reinforcement
learning. We demonstrate that existing bounds are quite loose, becoming vacuous
for large discount factors, due to the suboptimal treatment of compounding
probability errors. By carefully considering this quantity on its own, instead
of as a subcomponent of value error, we derive a bound that is sub-linear with
respect to transition function misspecification. We then demonstrate broader
applicability of this technique, improving a similar bound in the related
subfield of hierarchical abstraction. |
---|---|
DOI: | 10.48550/arxiv.2406.16249 |