Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control
Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can b...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Autonomous agents situated in real-world environments must be able to master
large repertoires of skills. While a single short skill can be learned quickly,
it would be impractical to learn every task independently. Instead, the agent
should share knowledge across behaviors such that each task can be learned
efficiently, and such that the resulting model can generalize to new tasks,
especially ones that are compositions or subsets of tasks seen previously. A
policy conditioned on a goal or demonstration has the potential to share
knowledge between tasks if it sees enough diversity of inputs. However, these
methods may not generalize to a more complex task at test time. We introduce
compositional plan vectors (CPVs) to enable a policy to perform compositions of
tasks without additional supervision. CPVs represent trajectories as the sum of
the subtasks within them. We show that CPVs can be learned within a one-shot
imitation learning framework without any additional supervision or information
about task hierarchy, and enable a demonstration-conditioned policy to
generalize to tasks that sequence twice as many skills as the tasks seen during
training.
Analogously to embeddings such as word2vec in NLP, CPVs can also support
simple arithmetic operations -- for example, we can add the CPVs for two
different tasks to command an agent to compose both tasks, without any
additional training. |
---|---|
DOI: | 10.48550/arxiv.1910.14033 |