Learning Modular Neural Network Policies for Multi-Task and Multi-Robot Transfer
Reinforcement learning (RL) can automate a wide variety of robotic skills, but learning each new skill requires considerable real-world data collection and manual representation engineering to design policy classes or features. Using deep reinforcement learning to train general purpose neural networ...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Reinforcement learning (RL) can automate a wide variety of robotic skills,
but learning each new skill requires considerable real-world data collection
and manual representation engineering to design policy classes or features.
Using deep reinforcement learning to train general purpose neural network
policies alleviates some of the burden of manual representation engineering by
using expressive policy classes, but exacerbates the challenge of data
collection, since such methods tend to be less efficient than RL with
low-dimensional, hand-designed representations. Transfer learning can mitigate
this problem by enabling us to transfer information from one skill to another
and even from one robot to another. We show that neural network policies can be
decomposed into "task-specific" and "robot-specific" modules, where the
task-specific modules are shared across robots, and the robot-specific modules
are shared across all tasks on that robot. This allows for sharing task
information, such as perception, between robots and sharing robot information,
such as dynamics and kinematics, between tasks. We exploit this decomposition
to train mix-and-match modules that can solve new robot-task combinations that
were not seen during training. Using a novel neural network architecture, we
demonstrate the effectiveness of our transfer method for enabling zero-shot
generalization with a variety of robots and tasks in simulation for both visual
and non-visual tasks. |
---|---|
DOI: | 10.48550/arxiv.1609.07088 |