Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories

Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Xie, Mandy, Li, Anqi, Van Wyk, Karl, Dellaert, Frank, Boots, Byron, Ratliff, Nathan
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Xie, Mandy
Li, Anqi
Van Wyk, Karl
Dellaert, Frank
Boots, Byron
Ratliff, Nathan
description Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories when compared to standard behavioral cloning methods. We present simulation results on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit lifting, target-reaching, and obstacle avoidance behaviors.
doi_str_mv 10.48550/arxiv.2105.03019
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2105_03019</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2105_03019</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-15805d9fd2a742699cf6a95e3e8ef23a16478ef8b230ae51a3fbf0f46997f2d03</originalsourceid><addsrcrecordid>eNotj01qwzAUhLXpoqQ5QFfVBezqx7KtZQhtGjAkEEOW5sWWyiu2FGQ5pD193aSrGZhhmI-QZ87SrFSKvUK44iUVnKmUScb1IzluB4wQ0TtaGQgO3Se9INADDlMfwRk_jXR3jjjgz73mLd37Hls0IwXX0dV0xR4hfNM6wJdpow9z9EQeLPSjWf7rgtTvb_X6I6l2m-16VSWQFzrhqmSq07YTUGQi17q1OWhlpCmNFRJ4nhWzK09CMjCKg7Qny2w2NwsrOiYX5OU-eyNrzgGH-UnzR9jcCOUvTUpNHQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories</title><source>arXiv.org</source><creator>Xie, Mandy ; Li, Anqi ; Van Wyk, Karl ; Dellaert, Frank ; Boots, Byron ; Ratliff, Nathan</creator><creatorcontrib>Xie, Mandy ; Li, Anqi ; Van Wyk, Karl ; Dellaert, Frank ; Boots, Byron ; Ratliff, Nathan</creatorcontrib><description>Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories when compared to standard behavioral cloning methods. We present simulation results on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit lifting, target-reaching, and obstacle avoidance behaviors.</description><identifier>DOI: 10.48550/arxiv.2105.03019</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Robotics</subject><creationdate>2021-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2105.03019$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2105.03019$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Xie, Mandy</creatorcontrib><creatorcontrib>Li, Anqi</creatorcontrib><creatorcontrib>Van Wyk, Karl</creatorcontrib><creatorcontrib>Dellaert, Frank</creatorcontrib><creatorcontrib>Boots, Byron</creatorcontrib><creatorcontrib>Ratliff, Nathan</creatorcontrib><title>Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories</title><description>Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories when compared to standard behavioral cloning methods. We present simulation results on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit lifting, target-reaching, and obstacle avoidance behaviors.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj01qwzAUhLXpoqQ5QFfVBezqx7KtZQhtGjAkEEOW5sWWyiu2FGQ5pD193aSrGZhhmI-QZ87SrFSKvUK44iUVnKmUScb1IzluB4wQ0TtaGQgO3Se9INADDlMfwRk_jXR3jjjgz73mLd37Hls0IwXX0dV0xR4hfNM6wJdpow9z9EQeLPSjWf7rgtTvb_X6I6l2m-16VSWQFzrhqmSq07YTUGQi17q1OWhlpCmNFRJ4nhWzK09CMjCKg7Qny2w2NwsrOiYX5OU-eyNrzgGH-UnzR9jcCOUvTUpNHQ</recordid><startdate>20210506</startdate><enddate>20210506</enddate><creator>Xie, Mandy</creator><creator>Li, Anqi</creator><creator>Van Wyk, Karl</creator><creator>Dellaert, Frank</creator><creator>Boots, Byron</creator><creator>Ratliff, Nathan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210506</creationdate><title>Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories</title><author>Xie, Mandy ; Li, Anqi ; Van Wyk, Karl ; Dellaert, Frank ; Boots, Byron ; Ratliff, Nathan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-15805d9fd2a742699cf6a95e3e8ef23a16478ef8b230ae51a3fbf0f46997f2d03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Xie, Mandy</creatorcontrib><creatorcontrib>Li, Anqi</creatorcontrib><creatorcontrib>Van Wyk, Karl</creatorcontrib><creatorcontrib>Dellaert, Frank</creatorcontrib><creatorcontrib>Boots, Byron</creatorcontrib><creatorcontrib>Ratliff, Nathan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xie, Mandy</au><au>Li, Anqi</au><au>Van Wyk, Karl</au><au>Dellaert, Frank</au><au>Boots, Byron</au><au>Ratliff, Nathan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories</atitle><date>2021-05-06</date><risdate>2021</risdate><abstract>Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic in practice; data used in IL frequently comes from offline processes such as lead-through or teleoperation. In this paper, we present a novel imitation learning technique called Collocation for Demonstration Encoding (CoDE) that operates on only a fixed set of trajectory demonstrations. We circumvent challenges with methods like back-propagation-through-time by introducing an auxiliary trajectory network, which takes inspiration from collocation techniques in optimal control. Our method generalizes well and more accurately reproduces the demonstrated behavior with fewer guiding trajectories when compared to standard behavioral cloning methods. We present simulation results on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit lifting, target-reaching, and obstacle avoidance behaviors.</abstract><doi>10.48550/arxiv.2105.03019</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2105.03019
ispartof
issn
language eng
recordid cdi_arxiv_primary_2105_03019
source arXiv.org
subjects Computer Science - Learning
Computer Science - Robotics
title Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T12%3A58%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Imitation%20Learning%20via%20Simultaneous%20Optimization%20of%20Policies%20and%20Auxiliary%20Trajectories&rft.au=Xie,%20Mandy&rft.date=2021-05-06&rft_id=info:doi/10.48550/arxiv.2105.03019&rft_dat=%3Carxiv_GOX%3E2105_03019%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true