Lipschitz Lifelong Reinforcement Learning
We consider the problem of knowledge transfer when an agent is facing a series of Reinforcement Learning (RL) tasks. We introduce a novel metric between Markov Decision Processes (MDPs) and establish that close MDPs have close optimal value functions. Formally, the optimal value functions are Lipsch...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We consider the problem of knowledge transfer when an agent is facing a
series of Reinforcement Learning (RL) tasks. We introduce a novel metric
between Markov Decision Processes (MDPs) and establish that close MDPs have
close optimal value functions. Formally, the optimal value functions are
Lipschitz continuous with respect to the tasks space. These theoretical results
lead us to a value-transfer method for Lifelong RL, which we use to build a
PAC-MDP algorithm with improved convergence rate. Further, we show the method
to experience no negative transfer with high probability. We illustrate the
benefits of the method in Lifelong RL experiments. |
---|---|
DOI: | 10.48550/arxiv.2001.05411 |