Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories

Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Xie, Mandy, Li, Anqi, Van Wyk, Karl, Dellaert, Frank, Boots, Byron, Ratliff, Nathan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Computer Science - Robotics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Xie, Mandy Li, Anqi Van Wyk, Karl Dellaert, Frank Boots, Byron Ratliff, Nathan
description	Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories when compared to standard behavioral cloning methods. We present simulation results on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit lifting, target-reaching, and obstacle avoidance behaviors.
doi_str_mv	10.48550/arxiv.2105.03019
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2105_03019</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2105_03019</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-15805d9fd2a742699cf6a95e3e8ef23a16478ef8b230ae51a3fbf0f46997f2d03</originalsourceid><addsrcrecordid>eNotj01qwzAUhLXpoqQ5QFfVBezqx7KtZQhtGjAkEEOW5sWWyiu2FGQ5pD193aSrGZhhmI-QZ87SrFSKvUK44iUVnKmUScb1IzluB4wQ0TtaGQgO3Se9INADDlMfwRk_jXR3jjjgz73mLd37Hls0IwXX0dV0xR4hfNM6wJdpow9z9EQeLPSjWf7rgtTvb_X6I6l2m-16VSWQFzrhqmSq07YTUGQi17q1OWhlpCmNFRJ4nhWzK09CMjCKg7Qny2w2NwsrOiYX5OU-eyNrzgGH-UnzR9jcCOUvTUpNHQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories</title><source>arXiv.org</source><creator>Xie, Mandy ; Li, Anqi ; Van Wyk, Karl ; Dellaert, Frank ; Boots, Byron ; Ratliff, Nathan</creator><creatorcontrib>Xie, Mandy ; Li, Anqi ; Van Wyk, Karl ; Dellaert, Frank ; Boots, Byron ; Ratliff, Nathan</creatorcontrib><description>Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories when compared to standard behavioral cloning methods. We present simulation results on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit lifting, target-reaching, and obstacle avoidance behaviors.</description><identifier>DOI: 10.48550/arxiv.2105.03019</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Robotics</subject><creationdate>2021-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2105.03019$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2105.03019$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Xie, Mandy</creatorcontrib><creatorcontrib>Li, Anqi</creatorcontrib><creatorcontrib>Van Wyk, Karl</creatorcontrib><creatorcontrib>Dellaert, Frank</creatorcontrib><creatorcontrib>Boots, Byron</creatorcontrib><creatorcontrib>Ratliff, Nathan</creatorcontrib><title>Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories</title><description>Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories when compared to standard behavioral cloning methods. We present simulation results on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit lifting, target-reaching, and obstacle avoidance behaviors.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj01qwzAUhLXpoqQ5QFfVBezqx7KtZQhtGjAkEEOW5sWWyiu2FGQ5pD193aSrGZhhmI-QZ87SrFSKvUK44iUVnKmUScb1IzluB4wQ0TtaGQgO3Se9INADDlMfwRk_jXR3jjjgz73mLd37Hls0IwXX0dV0xR4hfNM6wJdpow9z9EQeLPSjWf7rgtTvb_X6I6l2m-16VSWQFzrhqmSq07YTUGQi17q1OWhlpCmNFRJ4nhWzK09CMjCKg7Qny2w2NwsrOiYX5OU-eyNrzgGH-UnzR9jcCOUvTUpNHQ</recordid><startdate>20210506</startdate><enddate>20210506</enddate><creator>Xie, Mandy</creator><creator>Li, Anqi</creator><creator>Van Wyk, Karl</creator><creator>Dellaert, Frank</creator><creator>Boots, Byron</creator><creator>Ratliff, Nathan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210506</creationdate><title>Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories</title><author>Xie, Mandy ; Li, Anqi ; Van Wyk, Karl ; Dellaert, Frank ; Boots, Byron ; Ratliff, Nathan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-15805d9fd2a742699cf6a95e3e8ef23a16478ef8b230ae51a3fbf0f46997f2d03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Xie, Mandy</creatorcontrib><creatorcontrib>Li, Anqi</creatorcontrib><creatorcontrib>Van Wyk, Karl</creatorcontrib><creatorcontrib>Dellaert, Frank</creatorcontrib><creatorcontrib>Boots, Byron</creatorcontrib><creatorcontrib>Ratliff, Nathan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xie, Mandy</au><au>Li, Anqi</au><au>Van Wyk, Karl</au><au>Dellaert, Frank</au><au>Boots, Byron</au><au>Ratliff, Nathan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories</atitle><date>2021-05-06</date><risdate>2021</risdate><abstract>Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories when compared to standard behavioral cloning methods. We present simulation results on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit lifting, target-reaching, and obstacle avoidance behaviors.</abstract><doi>10.48550/arxiv.2105.03019</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2105.03019
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2105_03019
source	arXiv.org
subjects	Computer Science - Learning Computer Science - Robotics
title	Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T12%3A58%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Imitation%20Learning%20via%20Simultaneous%20Optimization%20of%20Policies%20and%20Auxiliary%20Trajectories&rft.au=Xie,%20Mandy&rft.date=2021-05-06&rft_id=info:doi/10.48550/arxiv.2105.03019&rft_dat=%3Carxiv_GOX%3E2105_03019%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true