Efficient Reinforcement Learning in Probabilistic Reward Machines

In this paper, we study reinforcement learning in Markov Decision Processes with Probabilistic Reward Machines (PRMs), a form of non-Markovian reward commonly found in robotics tasks. We design an algorithm for PRMs that achieves a regret bound of \(\widetilde{O}(\sqrt{HOAT} + H^2O^2A^{3/2} + H\sqrt...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-08
Hauptverfasser:	Lin, Xiaofeng, Zhang, Xuezhou
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Decision theory Lower bounds Markov processes Robotics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we study reinforcement learning in Markov Decision Processes with Probabilistic Reward Machines (PRMs), a form of non-Markovian reward commonly found in robotics tasks. We design an algorithm for PRMs that achieves a regret bound of \(\widetilde{O}(\sqrt{HOAT} + H^2O^2A^{3/2} + H\sqrt{T})\), where \(H\) is the time horizon, \(O\) is the number of observations, \(A\) is the number of actions, and \(T\) is the number of time-steps. This result improves over the best-known bound, \(\widetilde{O}(H\sqrt{OAT})\) of \citet{pmlr-v206-bourel23a} for MDPs with Deterministic Reward Machines (DRMs), a special case of PRMs. When \(T \geq H^3O^3A^2\) and \(OA \geq H\), our regret bound leads to a regret of \(\widetilde{O}(\sqrt{HOAT})\), which matches the established lower bound of \(\Omega(\sqrt{HOAT})\) for MDPs with DRMs up to a logarithmic factor. To the best of our knowledge, this is the first efficient algorithm for PRMs. Additionally, we present a new simulation lemma for non-Markovian rewards, which enables reward-free exploration for any non-Markovian reward given access to an approximate planner. Complementing our theoretical findings, we show through extensive experiment evaluations that our algorithm indeed outperforms prior methods in various PRM environments.
ISSN:	2331-8422