Data-Efficient Task Generalization via Probabilistic Model-based Meta Reinforcement Learning

We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift adaptation to new dynamics with minimal interaction data. Existing Meta-R...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE robotics and automation letters 2024-04, Vol.9 (4), p.1-8
Hauptverfasser:	Bhardwaj, Arjun, Rothfuss, Jonas, Sukhija, Bhavya, As, Yarden, Hutter, Marco, Coros, Stelian, Krause, Andreas
Format:	Artikel
Sprache:	eng
Schlagworte:	Adaptation Adaptation models Algorithms Artificial neural networks Bayes methods Data collection Dynamics Learning from Experience Machine learning Metalearning Model Learning for Control Probabilistic models Regularization Reinforcement Learning Robot learning Robotics Robots Task analysis Uncertainty
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	We introduce PACOH-RL, a novel model-based Meta-Reinforcement Learning (Meta-RL) algorithm designed to efficiently adapt control policies to changing dynamics. PACOH-RL meta-learns priors for the dynamics model, allowing swift adaptation to new dynamics with minimal interaction data. Existing Meta-RL methods require abundant meta-learning data, limiting their applicability in settings such as robotics, where data is costly to obtain. To address this, PACOH-RL incorporates regularization and epistemic uncertainty quantification in both the meta-learning and task adaptation stages. When facing new dynamics, we use these uncertainty estimates to effectively guide exploration and data collection. Overall, this enables positive transfer, even when access to data from prior tasks or dynamic settings is severely limited. Our experiment results demonstrate that PACOH-RL outperforms model-based RL and model-based Meta-RL baselines in adapting to new dynamic conditions. Finally, on a real robotic car, we showcase the potential for efficient RL policy adaptation in diverse, data-scarce conditions.
ISSN:	2377-3766 2377-3766
DOI:	10.1109/LRA.2024.3371260