Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control

Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can b...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Devin, Coline, Geng, Daniel, Abbeel, Pieter, Darrell, Trevor, Levine, Sergey
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning Computer Science - Robotics Statistics - Machine Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Devin, Coline Geng, Daniel Abbeel, Pieter Darrell, Trevor Levine, Sergey
description	Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations -- for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.
doi_str_mv	10.48550/arxiv.1910.14033
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_1910_14033</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>1910_14033</sourcerecordid><originalsourceid>FETCH-LOGICAL-a673-79ff5f4f1c5eef4fc8dfce2ebae93c8aa1589841126c02bbcf7852ea5d9259783</originalsourceid><addsrcrecordid>eNotj0FOwzAURL1hgQoHYFVfIG1sx4nNrkRtQSqCRdRt9ON-CwsnrhxTwe0bAqsnzYxGeoQ8sHxVKCnzNcRvd1kxPQWsyIW4JU_vHga6iS599JiceaR16M9hdMmFATyd6yOaFOJIbYj09csnlzUwfk7LIcXg78iNBT_i_T8XpNltm_o5O7ztX-rNIYOyElmlrZW2sMxIxIlGnaxBjh2gFkYBMKm0Khjjpcl51xlbKckR5ElzqSslFmT5dztLtOfoeog_7a9MO8uIK-oCRTs</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control</title><source>arXiv.org</source><creator>Devin, Coline ; Geng, Daniel ; Abbeel, Pieter ; Darrell, Trevor ; Levine, Sergey</creator><creatorcontrib>Devin, Coline ; Geng, Daniel ; Abbeel, Pieter ; Darrell, Trevor ; Levine, Sergey</creatorcontrib><description>Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations -- for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.</description><identifier>DOI: 10.48550/arxiv.1910.14033</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning ; Computer Science - Robotics ; Statistics - Machine Learning</subject><creationdate>2019-10</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/1910.14033$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.1910.14033$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Devin, Coline</creatorcontrib><creatorcontrib>Geng, Daniel</creatorcontrib><creatorcontrib>Abbeel, Pieter</creatorcontrib><creatorcontrib>Darrell, Trevor</creatorcontrib><creatorcontrib>Levine, Sergey</creatorcontrib><title>Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control</title><description>Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations -- for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><subject>Computer Science - Robotics</subject><subject>Statistics - Machine Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj0FOwzAURL1hgQoHYFVfIG1sx4nNrkRtQSqCRdRt9ON-CwsnrhxTwe0bAqsnzYxGeoQ8sHxVKCnzNcRvd1kxPQWsyIW4JU_vHga6iS599JiceaR16M9hdMmFATyd6yOaFOJIbYj09csnlzUwfk7LIcXg78iNBT_i_T8XpNltm_o5O7ztX-rNIYOyElmlrZW2sMxIxIlGnaxBjh2gFkYBMKm0Khjjpcl51xlbKckR5ElzqSslFmT5dztLtOfoeog_7a9MO8uIK-oCRTs</recordid><startdate>20191030</startdate><enddate>20191030</enddate><creator>Devin, Coline</creator><creator>Geng, Daniel</creator><creator>Abbeel, Pieter</creator><creator>Darrell, Trevor</creator><creator>Levine, Sergey</creator><scope>AKY</scope><scope>EPD</scope><scope>GOX</scope></search><sort><creationdate>20191030</creationdate><title>Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control</title><author>Devin, Coline ; Geng, Daniel ; Abbeel, Pieter ; Darrell, Trevor ; Levine, Sergey</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a673-79ff5f4f1c5eef4fc8dfce2ebae93c8aa1589841126c02bbcf7852ea5d9259783</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><topic>Computer Science - Robotics</topic><topic>Statistics - Machine Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Devin, Coline</creatorcontrib><creatorcontrib>Geng, Daniel</creatorcontrib><creatorcontrib>Abbeel, Pieter</creatorcontrib><creatorcontrib>Darrell, Trevor</creatorcontrib><creatorcontrib>Levine, Sergey</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv Statistics</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Devin, Coline</au><au>Geng, Daniel</au><au>Abbeel, Pieter</au><au>Darrell, Trevor</au><au>Levine, Sergey</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control</atitle><date>2019-10-30</date><risdate>2019</risdate><abstract>Autonomous agents situated in real-world environments must be able to master large repertoires of skills. While a single short skill can be learned quickly, it would be impractical to learn every task independently. Instead, the agent should share knowledge across behaviors such that each task can be learned efficiently, and such that the resulting model can generalize to new tasks, especially ones that are compositions or subsets of tasks seen previously. A policy conditioned on a goal or demonstration has the potential to share knowledge between tasks if it sees enough diversity of inputs. However, these methods may not generalize to a more complex task at test time. We introduce compositional plan vectors (CPVs) to enable a policy to perform compositions of tasks without additional supervision. CPVs represent trajectories as the sum of the subtasks within them. We show that CPVs can be learned within a one-shot imitation learning framework without any additional supervision or information about task hierarchy, and enable a demonstration-conditioned policy to generalize to tasks that sequence twice as many skills as the tasks seen during training. Analogously to embeddings such as word2vec in NLP, CPVs can also support simple arithmetic operations -- for example, we can add the CPVs for two different tasks to command an agent to compose both tasks, without any additional training.</abstract><doi>10.48550/arxiv.1910.14033</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.1910.14033
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_1910_14033
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning Computer Science - Robotics Statistics - Machine Learning
title	Plan Arithmetic: Compositional Plan Vectors for Multi-Task Control
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-23T11%3A03%3A40IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Plan%20Arithmetic:%20Compositional%20Plan%20Vectors%20for%20Multi-Task%20Control&rft.au=Devin,%20Coline&rft.date=2019-10-30&rft_id=info:doi/10.48550/arxiv.1910.14033&rft_dat=%3Carxiv_GOX%3E1910_14033%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true