Good Better Best: Self-Motivated Imitation Learning for noisy Demonstrations
Imitation Learning (IL) aims to discover a policy by minimizing the discrepancy between the agent's behavior and expert demonstrations. However, IL is susceptible to limitations imposed by noisy demonstrations from non-expert behaviors, presenting a significant challenge due to the lack of supp...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Imitation Learning (IL) aims to discover a policy by minimizing the
discrepancy between the agent's behavior and expert demonstrations. However, IL
is susceptible to limitations imposed by noisy demonstrations from non-expert
behaviors, presenting a significant challenge due to the lack of supplementary
information to assess their expertise. In this paper, we introduce
Self-Motivated Imitation LEarning (SMILE), a method capable of progressively
filtering out demonstrations collected by policies deemed inferior to the
current policy, eliminating the need for additional information. We utilize the
forward and reverse processes of Diffusion Models to emulate the shift in
demonstration expertise from low to high and vice versa, thereby extracting the
noise information that diffuses expertise. Then, the noise information is
leveraged to predict the diffusion steps between the current policy and
demonstrators, which we theoretically demonstrate its equivalence to their
expertise gap. We further explain in detail how the predicted diffusion steps
are applied to filter out noisy demonstrations in a self-motivated manner and
provide its theoretical grounds. Through empirical evaluations on MuJoCo tasks,
we demonstrate that our method is proficient in learning the expert policy
amidst noisy demonstrations, and effectively filters out demonstrations with
expertise inferior to the current policy. |
---|---|
DOI: | 10.48550/arxiv.2310.15815 |