Expert-Trajectory-Based Features for Apprenticeship Learning via Inverse Reinforcement Learning for Robotic Manipulation

This paper explores the application of Inverse Reinforcement Learning (IRL) in robotics, focusing on inferring reward functions from expert demonstrations of robot arm manipulation tasks. By leveraging IRL, we aim to develop efficient and adaptable techniques for learning robust solutions to complex...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Applied sciences 2024-12, Vol.14 (23), p.11131
Hauptverfasser: Naranjo-Campos, Francisco J., Victores, Juan G., Balaguer, Carlos
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper explores the application of Inverse Reinforcement Learning (IRL) in robotics, focusing on inferring reward functions from expert demonstrations of robot arm manipulation tasks. By leveraging IRL, we aim to develop efficient and adaptable techniques for learning robust solutions to complex tasks in continuous state spaces. Our approach combines Apprenticeship Learning via IRL with Proximal Policy Optimization (PPO), expert-trajectory-based features, and the application of a reverse discount. The feature space is constructed by sampling expert trajectories to capture essential task characteristics, enhancing learning efficiency and generalizability by concentrating on critical states. To prevent the vanishing of feature expectations in goal states, we introduce a reverse discounting application to prioritize feature expectations in final states. We validate our methodology through experiments in a simple GridWorld environment, demonstrating that reverse discounting enhances the alignment of the agent’s features with those of the expert. Additionally, we explore how the parameters of the proposed feature definition influence performance. Further experiments on robotic manipulation tasks using the TIAGo robot compare our approach with state-of-the-art methods, confirming its effectiveness and adaptability in complex continuous state spaces across diverse manipulation tasks.
ISSN:2076-3417
2076-3417
DOI:10.3390/app142311131