Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation

In recent years, domains such as natural language processing and image recognition have popularized the paradigm of using large datasets to pretrain representations that can be effectively transferred to downstream tasks. In this work we evaluate how such a paradigm should be done in imitation learn...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Brandfonbrener, David, Nachum, Ofir, Bruna, Joan
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Brandfonbrener, David Nachum, Ofir Bruna, Joan
description	In recent years, domains such as natural language processing and image recognition have popularized the paradigm of using large datasets to pretrain representations that can be effectively transferred to downstream tasks. In this work we evaluate how such a paradigm should be done in imitation learning, where both pretraining and finetuning data are trajectories collected by experts interacting with an unknown environment. Namely, we consider a setting where the pretraining corpus consists of multitask demonstrations and the task for each demonstration is set by an unobserved latent context variable. The goal is to use the pretraining corpus to learn a low dimensional representation of the high dimensional (e.g., visual) observation space which can be transferred to a novel context for finetuning on a limited dataset of demonstrations. Among a variety of possible pretraining objectives, we argue that inverse dynamics modeling -- i.e., predicting an action given the observations appearing before and after it in the demonstration -- is well-suited to this setting. We provide empirical evidence of this claim through evaluations on a variety of simulated visuomotor manipulation problems. While previous work has attempted various theoretical explanations regarding the benefit of inverse dynamics modeling, we find that these arguments are insufficient to explain the empirical advantages often observed in our settings, and so we derive a novel analysis using a simple but general environment model.
doi_str_mv	10.48550/arxiv.2305.16985
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2305_16985</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2305_16985</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-de767dea1e7512e424390cc252147d265571709d41dc1b1957c57dfe402402a93</originalsourceid><addsrcrecordid>eNotj1FLwzAUhfPig0x_gE_mD7QmaW6zPsrUWehwyN7LNbkdYWs6kjjcv3duwoED54MDH2MPUpR6DiCeMP74Y6kqAaWsmzncsnUbjhQT8ZdTwNHbxNeRckQffNjyjjCGxJfT5PgnHSIlChmzn87jMEW--t5nnzHteDv6K7hjNwPuE93_94xt3l43i_ei-1i2i-euwNpA4cjUxhFKMiAVaaWrRlirQEltnKoBjDSicVo6K79kA8aCcQNpoc7Bppqxx-vtRak_RD9iPPV_av1FrfoFlfZJbw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation</title><source>arXiv.org</source><creator>Brandfonbrener, David ; Nachum, Ofir ; Bruna, Joan</creator><creatorcontrib>Brandfonbrener, David ; Nachum, Ofir ; Bruna, Joan</creatorcontrib><description>In recent years, domains such as natural language processing and image recognition have popularized the paradigm of using large datasets to pretrain representations that can be effectively transferred to downstream tasks. In this work we evaluate how such a paradigm should be done in imitation learning, where both pretraining and finetuning data are trajectories collected by experts interacting with an unknown environment. Namely, we consider a setting where the pretraining corpus consists of multitask demonstrations and the task for each demonstration is set by an unobserved latent context variable. The goal is to use the pretraining corpus to learn a low dimensional representation of the high dimensional (e.g., visual) observation space which can be transferred to a novel context for finetuning on a limited dataset of demonstrations. Among a variety of possible pretraining objectives, we argue that inverse dynamics modeling -- i.e., predicting an action given the observations appearing before and after it in the demonstration -- is well-suited to this setting. We provide empirical evidence of this claim through evaluations on a variety of simulated visuomotor manipulation problems. While previous work has attempted various theoretical explanations regarding the benefit of inverse dynamics modeling, we find that these arguments are insufficient to explain the empirical advantages often observed in our settings, and so we derive a novel analysis using a simple but general environment model.</description><identifier>DOI: 10.48550/arxiv.2305.16985</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2023-05</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2305.16985$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2305.16985$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Brandfonbrener, David</creatorcontrib><creatorcontrib>Nachum, Ofir</creatorcontrib><creatorcontrib>Bruna, Joan</creatorcontrib><title>Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation</title><description>In recent years, domains such as natural language processing and image recognition have popularized the paradigm of using large datasets to pretrain representations that can be effectively transferred to downstream tasks. In this work we evaluate how such a paradigm should be done in imitation learning, where both pretraining and finetuning data are trajectories collected by experts interacting with an unknown environment. Namely, we consider a setting where the pretraining corpus consists of multitask demonstrations and the task for each demonstration is set by an unobserved latent context variable. The goal is to use the pretraining corpus to learn a low dimensional representation of the high dimensional (e.g., visual) observation space which can be transferred to a novel context for finetuning on a limited dataset of demonstrations. Among a variety of possible pretraining objectives, we argue that inverse dynamics modeling -- i.e., predicting an action given the observations appearing before and after it in the demonstration -- is well-suited to this setting. We provide empirical evidence of this claim through evaluations on a variety of simulated visuomotor manipulation problems. While previous work has attempted various theoretical explanations regarding the benefit of inverse dynamics modeling, we find that these arguments are insufficient to explain the empirical advantages often observed in our settings, and so we derive a novel analysis using a simple but general environment model.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNotj1FLwzAUhfPig0x_gE_mD7QmaW6zPsrUWehwyN7LNbkdYWs6kjjcv3duwoED54MDH2MPUpR6DiCeMP74Y6kqAaWsmzncsnUbjhQT8ZdTwNHbxNeRckQffNjyjjCGxJfT5PgnHSIlChmzn87jMEW--t5nnzHteDv6K7hjNwPuE93_94xt3l43i_ei-1i2i-euwNpA4cjUxhFKMiAVaaWrRlirQEltnKoBjDSicVo6K79kA8aCcQNpoc7Bppqxx-vtRak_RD9iPPV_av1FrfoFlfZJbw</recordid><startdate>20230526</startdate><enddate>20230526</enddate><creator>Brandfonbrener, David</creator><creator>Nachum, Ofir</creator><creator>Bruna, Joan</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20230526</creationdate><title>Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation</title><author>Brandfonbrener, David ; Nachum, Ofir ; Bruna, Joan</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-de767dea1e7512e424390cc252147d265571709d41dc1b1957c57dfe402402a93</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Brandfonbrener, David</creatorcontrib><creatorcontrib>Nachum, Ofir</creatorcontrib><creatorcontrib>Bruna, Joan</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Brandfonbrener, David</au><au>Nachum, Ofir</au><au>Bruna, Joan</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation</atitle><date>2023-05-26</date><risdate>2023</risdate><abstract>In recent years, domains such as natural language processing and image recognition have popularized the paradigm of using large datasets to pretrain representations that can be effectively transferred to downstream tasks. In this work we evaluate how such a paradigm should be done in imitation learning, where both pretraining and finetuning data are trajectories collected by experts interacting with an unknown environment. Namely, we consider a setting where the pretraining corpus consists of multitask demonstrations and the task for each demonstration is set by an unobserved latent context variable. The goal is to use the pretraining corpus to learn a low dimensional representation of the high dimensional (e.g., visual) observation space which can be transferred to a novel context for finetuning on a limited dataset of demonstrations. Among a variety of possible pretraining objectives, we argue that inverse dynamics modeling -- i.e., predicting an action given the observations appearing before and after it in the demonstration -- is well-suited to this setting. We provide empirical evidence of this claim through evaluations on a variety of simulated visuomotor manipulation problems. While previous work has attempted various theoretical explanations regarding the benefit of inverse dynamics modeling, we find that these arguments are insufficient to explain the empirical advantages often observed in our settings, and so we derive a novel analysis using a simple but general environment model.</abstract><doi>10.48550/arxiv.2305.16985</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2305.16985
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2305_16985
source	arXiv.org
subjects	Computer Science - Learning
title	Inverse Dynamics Pretraining Learns Good Representations for Multitask Imitation
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-18T10%3A15%3A00IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Inverse%20Dynamics%20Pretraining%20Learns%20Good%20Representations%20for%20Multitask%20Imitation&rft.au=Brandfonbrener,%20David&rft.date=2023-05-26&rft_id=info:doi/10.48550/arxiv.2305.16985&rft_dat=%3Carxiv_GOX%3E2305_16985%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true