Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control

Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can b...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Devin, Coline, Geng, Daniel, Abbeel, Pieter, Darrell, Trevor, Levine, Sergey
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Devin, Coline
Geng, Daniel
Abbeel, Pieter
Darrell, Trevor
Levine, Sergey
description Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations -- for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.
doi_str_mv 10.48550/arxiv.1910.14033
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1910_14033</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1910_14033</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-79ff5f4f1c5eef4fc8dfce2ebae93c8aa1589841126c02bbcf7852ea5d9259783</originalsourceid><addsrcrecordid>eNotj0FOwzAURL1hgQoHYFVfIG1sx4nNrkRtQSqCRdRt9ON-CwsnrhxTwe0bAqsnzYxGeoQ8sHxVKCnzNcRvd1kxPQWsyIW4JU_vHga6iS599JiceaR16M9hdMmFATyd6yOaFOJIbYj09csnlzUwfk7LIcXg78iNBT_i_T8XpNltm_o5O7ztX-rNIYOyElmlrZW2sMxIxIlGnaxBjh2gFkYBMKm0Khjjpcl51xlbKckR5ElzqSslFmT5dztLtOfoeog_7a9MO8uIK-oCRTs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control</title><source>arXiv.org</source><creator>Devin, Coline ; Geng, Daniel ; Abbeel, Pieter ; Darrell, Trevor ; Levine, Sergey</creator><creatorcontrib>Devin, Coline ; Geng, Daniel ; Abbeel, Pieter ; Darrell, Trevor ; Levine, Sergey</creatorcontrib><description>Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations -- for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.</description><identifier>DOI: 10.48550/arxiv.1910.14033</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Computer Science - Robotics ; Statistics - Machine Learning</subject><creationdate>2019-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1910.14033$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1910.14033$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Devin, Coline</creatorcontrib><creatorcontrib>Geng, Daniel</creatorcontrib><creatorcontrib>Abbeel, Pieter</creatorcontrib><creatorcontrib>Darrell, Trevor</creatorcontrib><creatorcontrib>Levine, Sergey</creatorcontrib><title>Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control</title><description>Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations -- for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Robotics</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj0FOwzAURL1hgQoHYFVfIG1sx4nNrkRtQSqCRdRt9ON-CwsnrhxTwe0bAqsnzYxGeoQ8sHxVKCnzNcRvd1kxPQWsyIW4JU_vHga6iS599JiceaR16M9hdMmFATyd6yOaFOJIbYj09csnlzUwfk7LIcXg78iNBT_i_T8XpNltm_o5O7ztX-rNIYOyElmlrZW2sMxIxIlGnaxBjh2gFkYBMKm0Khjjpcl51xlbKckR5ElzqSslFmT5dztLtOfoeog_7a9MO8uIK-oCRTs</recordid><startdate>20191030</startdate><enddate>20191030</enddate><creator>Devin, Coline</creator><creator>Geng, Daniel</creator><creator>Abbeel, Pieter</creator><creator>Darrell, Trevor</creator><creator>Levine, Sergey</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20191030</creationdate><title>Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control</title><author>Devin, Coline ; Geng, Daniel ; Abbeel, Pieter ; Darrell, Trevor ; Levine, Sergey</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-79ff5f4f1c5eef4fc8dfce2ebae93c8aa1589841126c02bbcf7852ea5d9259783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Robotics</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Devin, Coline</creatorcontrib><creatorcontrib>Geng, Daniel</creatorcontrib><creatorcontrib>Abbeel, Pieter</creatorcontrib><creatorcontrib>Darrell, Trevor</creatorcontrib><creatorcontrib>Levine, Sergey</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Devin, Coline</au><au>Geng, Daniel</au><au>Abbeel, Pieter</au><au>Darrell, Trevor</au><au>Levine, Sergey</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control</atitle><date>2019-10-30</date><risdate>2019</risdate><abstract>Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations -- for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.</abstract><doi>10.48550/arxiv.1910.14033</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.1910.14033
ispartof
issn
language eng
recordid cdi_arxiv_primary_1910_14033
source arXiv.org
subjects Computer Science - Artificial Intelligence
Computer Science - Learning
Computer Science - Robotics
Statistics - Machine Learning
title Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T11%3A03%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Plan%20Arithmetic:%20Compositional%20Plan%20Vectors%20for%20Multi-Task%20Control&rft.au=Devin,%20Coline&rft.date=2019-10-30&rft_id=info:doi/10.48550/arxiv.1910.14033&rft_dat=%3Carxiv_GOX%3E1910_14033%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true