Conditional Kernel Imitation Learning for Continuous State Environments
Imitation Learning (IL) is an important paradigm within the broader reinforcement learning (RL) methodology. Unlike most of RL, it does not assume availability of reward-feedback. Reward inference and shaping are known to be difficult and error-prone methods particularly when the demonstration data...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Imitation Learning (IL) is an important paradigm within the broader
reinforcement learning (RL) methodology. Unlike most of RL, it does not assume
availability of reward-feedback. Reward inference and shaping are known to be
difficult and error-prone methods particularly when the demonstration data
comes from human experts. Classical methods such as behavioral cloning and
inverse reinforcement learning are highly sensitive to estimation errors, a
problem that is particularly acute in continuous state space problems.
Meanwhile, state-of-the-art IL algorithms convert behavioral policy learning
problems into distribution-matching problems which often require additional
online interaction data to be effective. In this paper, we consider the problem
of imitation learning in continuous state space environments based solely on
observed behavior, without access to transition dynamics information, reward
structure, or, most importantly, any additional interactions with the
environment. Our approach is based on the Markov balance equation and
introduces a novel conditional kernel density estimation-based imitation
learning framework. It involves estimating the environment's transition
dynamics using conditional kernel density estimators and seeks to satisfy the
probabilistic balance equations for the environment. We establish that our
estimators satisfy basic asymptotic consistency requirements. Through a series
of numerical experiments on continuous state benchmark environments, we show
consistently superior empirical performance over many state-of-the-art IL
algorithms. |
---|---|
DOI: | 10.48550/arxiv.2308.12573 |