Reinforcement learning with model-based feedforward inputs for robotic table tennis

We rethink the traditional reinforcement learning approach, which is based on optimizing over feedback policies, and propose a new framework that optimizes over feedforward inputs instead. This not only mitigates the risk of destabilizing the system during training but also reduces the bulk of the l...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Autonomous robots 2023-12, Vol.47 (8), p.1387-1403
Hauptverfasser: Ma, Hao, Büchler, Dieter, Schölkopf, Bernhard, Muehlebach, Michael
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:We rethink the traditional reinforcement learning approach, which is based on optimizing over feedback policies, and propose a new framework that optimizes over feedforward inputs instead. This not only mitigates the risk of destabilizing the system during training but also reduces the bulk of the learning to a supervised learning task. As a result, efficient and well-understood supervised learning techniques can be applied and are tuned using a validation data set. The labels are generated with a variant of iterative learning control, which also includes prior knowledge about the underlying dynamics. Our framework is applied for intercepting and returning ping-pong balls that are played to a four-degrees-of-freedom robotic arm in real-world experiments. The robot arm is driven by pneumatic artificial muscles, which makes the control and learning tasks challenging. We highlight the potential of our framework by comparing it to a reinforcement learning approach that optimizes over feedback policies. We find that our framework achieves a higher success rate for the returns ( 100 % vs. 96 % , on 107 consecutive trials, see https://youtu.be/kR9jowEH7PY ) while requiring only about one tenth of the samples during training. We also find that our approach is able to deal with a variant of different incoming trajectories.
ISSN:0929-5593
1573-7527
DOI:10.1007/s10514-023-10140-6