Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories
Imitation learning (IL) is a frequently used approach for data-efficient policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat challenges like distributional shift by interacting with oracular experts. Unfortunately, assuming access to oracular experts is often unrealistic i...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Xie, Mandy Li, Anqi Van Wyk, Karl Dellaert, Frank Boots, Byron Ratliff, Nathan |
description | Imitation learning (IL) is a frequently used approach for data-efficient
policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat
challenges like distributional shift by interacting with oracular experts.
Unfortunately, assuming access to oracular experts is often unrealistic in
practice; data used in IL frequently comes from offline processes such as
lead-through or teleoperation. In this paper, we present a novel imitation
learning technique called Collocation for Demonstration Encoding (CoDE) that
operates on only a fixed set of trajectory demonstrations. We circumvent
challenges with methods like back-propagation-through-time by introducing an
auxiliary trajectory network, which takes inspiration from collocation
techniques in optimal control. Our method generalizes well and more accurately
reproduces the demonstrated behavior with fewer guiding trajectories when
compared to standard behavioral cloning methods. We present simulation results
on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit
lifting, target-reaching, and obstacle avoidance behaviors. |
doi_str_mv | 10.48550/arxiv.2105.03019 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2105_03019</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2105_03019</sourcerecordid><originalsourceid>FETCH-LOGICAL-a679-15805d9fd2a742699cf6a95e3e8ef23a16478ef8b230ae51a3fbf0f46997f2d03</originalsourceid><addsrcrecordid>eNotj01qwzAUhLXpoqQ5QFfVBezqx7KtZQhtGjAkEEOW5sWWyiu2FGQ5pD193aSrGZhhmI-QZ87SrFSKvUK44iUVnKmUScb1IzluB4wQ0TtaGQgO3Se9INADDlMfwRk_jXR3jjjgz73mLd37Hls0IwXX0dV0xR4hfNM6wJdpow9z9EQeLPSjWf7rgtTvb_X6I6l2m-16VSWQFzrhqmSq07YTUGQi17q1OWhlpCmNFRJ4nhWzK09CMjCKg7Qny2w2NwsrOiYX5OU-eyNrzgGH-UnzR9jcCOUvTUpNHQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories</title><source>arXiv.org</source><creator>Xie, Mandy ; Li, Anqi ; Van Wyk, Karl ; Dellaert, Frank ; Boots, Byron ; Ratliff, Nathan</creator><creatorcontrib>Xie, Mandy ; Li, Anqi ; Van Wyk, Karl ; Dellaert, Frank ; Boots, Byron ; Ratliff, Nathan</creatorcontrib><description>Imitation learning (IL) is a frequently used approach for data-efficient
policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat
challenges like distributional shift by interacting with oracular experts.
Unfortunately, assuming access to oracular experts is often unrealistic in
practice; data used in IL frequently comes from offline processes such as
lead-through or teleoperation. In this paper, we present a novel imitation
learning technique called Collocation for Demonstration Encoding (CoDE) that
operates on only a fixed set of trajectory demonstrations. We circumvent
challenges with methods like back-propagation-through-time by introducing an
auxiliary trajectory network, which takes inspiration from collocation
techniques in optimal control. Our method generalizes well and more accurately
reproduces the demonstrated behavior with fewer guiding trajectories when
compared to standard behavioral cloning methods. We present simulation results
on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit
lifting, target-reaching, and obstacle avoidance behaviors.</description><identifier>DOI: 10.48550/arxiv.2105.03019</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Robotics</subject><creationdate>2021-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2105.03019$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2105.03019$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Xie, Mandy</creatorcontrib><creatorcontrib>Li, Anqi</creatorcontrib><creatorcontrib>Van Wyk, Karl</creatorcontrib><creatorcontrib>Dellaert, Frank</creatorcontrib><creatorcontrib>Boots, Byron</creatorcontrib><creatorcontrib>Ratliff, Nathan</creatorcontrib><title>Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories</title><description>Imitation learning (IL) is a frequently used approach for data-efficient
policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat
challenges like distributional shift by interacting with oracular experts.
Unfortunately, assuming access to oracular experts is often unrealistic in
practice; data used in IL frequently comes from offline processes such as
lead-through or teleoperation. In this paper, we present a novel imitation
learning technique called Collocation for Demonstration Encoding (CoDE) that
operates on only a fixed set of trajectory demonstrations. We circumvent
challenges with methods like back-propagation-through-time by introducing an
auxiliary trajectory network, which takes inspiration from collocation
techniques in optimal control. Our method generalizes well and more accurately
reproduces the demonstrated behavior with fewer guiding trajectories when
compared to standard behavioral cloning methods. We present simulation results
on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit
lifting, target-reaching, and obstacle avoidance behaviors.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2021</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj01qwzAUhLXpoqQ5QFfVBezqx7KtZQhtGjAkEEOW5sWWyiu2FGQ5pD193aSrGZhhmI-QZ87SrFSKvUK44iUVnKmUScb1IzluB4wQ0TtaGQgO3Se9INADDlMfwRk_jXR3jjjgz73mLd37Hls0IwXX0dV0xR4hfNM6wJdpow9z9EQeLPSjWf7rgtTvb_X6I6l2m-16VSWQFzrhqmSq07YTUGQi17q1OWhlpCmNFRJ4nhWzK09CMjCKg7Qny2w2NwsrOiYX5OU-eyNrzgGH-UnzR9jcCOUvTUpNHQ</recordid><startdate>20210506</startdate><enddate>20210506</enddate><creator>Xie, Mandy</creator><creator>Li, Anqi</creator><creator>Van Wyk, Karl</creator><creator>Dellaert, Frank</creator><creator>Boots, Byron</creator><creator>Ratliff, Nathan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20210506</creationdate><title>Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories</title><author>Xie, Mandy ; Li, Anqi ; Van Wyk, Karl ; Dellaert, Frank ; Boots, Byron ; Ratliff, Nathan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a679-15805d9fd2a742699cf6a95e3e8ef23a16478ef8b230ae51a3fbf0f46997f2d03</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2021</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Xie, Mandy</creatorcontrib><creatorcontrib>Li, Anqi</creatorcontrib><creatorcontrib>Van Wyk, Karl</creatorcontrib><creatorcontrib>Dellaert, Frank</creatorcontrib><creatorcontrib>Boots, Byron</creatorcontrib><creatorcontrib>Ratliff, Nathan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xie, Mandy</au><au>Li, Anqi</au><au>Van Wyk, Karl</au><au>Dellaert, Frank</au><au>Boots, Byron</au><au>Ratliff, Nathan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories</atitle><date>2021-05-06</date><risdate>2021</risdate><abstract>Imitation learning (IL) is a frequently used approach for data-efficient
policy learning. Many IL methods, such as Dataset Aggregation (DAgger), combat
challenges like distributional shift by interacting with oracular experts.
Unfortunately, assuming access to oracular experts is often unrealistic in
practice; data used in IL frequently comes from offline processes such as
lead-through or teleoperation. In this paper, we present a novel imitation
learning technique called Collocation for Demonstration Encoding (CoDE) that
operates on only a fixed set of trajectory demonstrations. We circumvent
challenges with methods like back-propagation-through-time by introducing an
auxiliary trajectory network, which takes inspiration from collocation
techniques in optimal control. Our method generalizes well and more accurately
reproduces the demonstrated behavior with fewer guiding trajectories when
compared to standard behavioral cloning methods. We present simulation results
on a 7-degree-of-freedom (DoF) robotic manipulator that learns to exhibit
lifting, target-reaching, and obstacle avoidance behaviors.</abstract><doi>10.48550/arxiv.2105.03019</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2105.03019 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2105_03019 |
source | arXiv.org |
subjects | Computer Science - Learning Computer Science - Robotics |
title | Imitation Learning via Simultaneous Optimization of Policies and Auxiliary Trajectories |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T12%3A58%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Imitation%20Learning%20via%20Simultaneous%20Optimization%20of%20Policies%20and%20Auxiliary%20Trajectories&rft.au=Xie,%20Mandy&rft.date=2021-05-06&rft_id=info:doi/10.48550/arxiv.2105.03019&rft_dat=%3Carxiv_GOX%3E2105_03019%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |