An Efficient Image-to-Image Translation HourGlass-based Architecture for Object Pushing Policy Learning
Humans effortlessly solve pushing tasks in everyday life but unlocking these capabilities remains a challenge in robotics because physics models of these tasks are often inaccurate or unattainable. State-of-the-art data-driven approaches learn to compensate for these inaccuracies or replace the appr...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Humans effortlessly solve pushing tasks in everyday life but unlocking these
capabilities remains a challenge in robotics because physics models of these
tasks are often inaccurate or unattainable. State-of-the-art data-driven
approaches learn to compensate for these inaccuracies or replace the
approximated physics models altogether. Nevertheless, approaches like Deep
Q-Networks (DQNs) suffer from local optima in large state-action spaces.
Furthermore, they rely on well-chosen deep learning architectures and learning
paradigms. In this paper, we propose to frame the learning of pushing policies
(where to push and how) by DQNs as an image-to-image translation problem and
exploit an Hourglass-based architecture. We present an architecture combining a
predictor of which pushes lead to changes in the environment with a
state-action value predictor dedicated to the pushing task. Moreover, we
investigate positional information encoding to learn position-dependent policy
behaviors. We demonstrate in simulation experiments with a UR5 robot arm that
our overall architecture helps the DQN learn faster and achieve higher
performance in a pushing task involving objects with unknown dynamics. |
---|---|
DOI: | 10.48550/arxiv.2108.01034 |