Robust Offline Imitation Learning from Diverse Auxiliary Data
Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongs...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | |
container_volume | |
creator | Ghosh, Udita Raychaudhuri, Dripta S Li, Jiachen Karydis, Konstantinos Roy-Chowdhury, Amit K |
description | Offline imitation learning enables learning a policy solely from a set of
expert demonstrations, without any environment interaction. To alleviate the
issue of distribution shift arising due to the small amount of expert data,
recent works incorporate large numbers of auxiliary demonstrations alongside
the expert data. However, the performance of these approaches rely on
assumptions about the quality and composition of the auxiliary data. However,
they are rarely successful when those assumptions do not hold. To address this
limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data
(ROIDA). ROIDA first identifies high-quality transitions from the entire
auxiliary dataset using a learned reward function. These high-reward samples
are combined with the expert demonstrations for weighted behavioral cloning.
For lower-quality samples, ROIDA applies temporal difference learning to steer
the policy towards high-reward states, improving long-term returns. This
two-pronged approach enables our framework to effectively leverage both high
and low-quality data without any assumptions. Extensive experiments validate
that ROIDA achieves robust and consistent performance across multiple auxiliary
datasets with diverse ratios of expert and non-expert demonstrations. ROIDA
effectively leverages unlabeled auxiliary data, outperforming prior methods
reliant on specific data assumptions. |
doi_str_mv | 10.48550/arxiv.2410.03626 |
format | Article |
fullrecord | <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_03626</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_03626</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_036263</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBibGZlxMtgG5SeVFpco-Kel5WTmpSp45maWJJZk5ucp-KQmFuVl5qUrpBXl5yq4ZJalFhWnKjiWVmTmZCYWVSq4JJYk8jCwpiXmFKfyQmluBnk31xBnD12wRfEFRZm5QKXxIAvjwRYaE1YBADL_NXE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Robust Offline Imitation Learning from Diverse Auxiliary Data</title><source>arXiv.org</source><creator>Ghosh, Udita ; Raychaudhuri, Dripta S ; Li, Jiachen ; Karydis, Konstantinos ; Roy-Chowdhury, Amit K</creator><creatorcontrib>Ghosh, Udita ; Raychaudhuri, Dripta S ; Li, Jiachen ; Karydis, Konstantinos ; Roy-Chowdhury, Amit K</creatorcontrib><description>Offline imitation learning enables learning a policy solely from a set of
expert demonstrations, without any environment interaction. To alleviate the
issue of distribution shift arising due to the small amount of expert data,
recent works incorporate large numbers of auxiliary demonstrations alongside
the expert data. However, the performance of these approaches rely on
assumptions about the quality and composition of the auxiliary data. However,
they are rarely successful when those assumptions do not hold. To address this
limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data
(ROIDA). ROIDA first identifies high-quality transitions from the entire
auxiliary dataset using a learned reward function. These high-reward samples
are combined with the expert demonstrations for weighted behavioral cloning.
For lower-quality samples, ROIDA applies temporal difference learning to steer
the policy towards high-reward states, improving long-term returns. This
two-pronged approach enables our framework to effectively leverage both high
and low-quality data without any assumptions. Extensive experiments validate
that ROIDA achieves robust and consistent performance across multiple auxiliary
datasets with diverse ratios of expert and non-expert demonstrations. ROIDA
effectively leverages unlabeled auxiliary data, outperforming prior methods
reliant on specific data assumptions.</description><identifier>DOI: 10.48550/arxiv.2410.03626</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2024-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.03626$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.03626$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ghosh, Udita</creatorcontrib><creatorcontrib>Raychaudhuri, Dripta S</creatorcontrib><creatorcontrib>Li, Jiachen</creatorcontrib><creatorcontrib>Karydis, Konstantinos</creatorcontrib><creatorcontrib>Roy-Chowdhury, Amit K</creatorcontrib><title>Robust Offline Imitation Learning from Diverse Auxiliary Data</title><description>Offline imitation learning enables learning a policy solely from a set of
expert demonstrations, without any environment interaction. To alleviate the
issue of distribution shift arising due to the small amount of expert data,
recent works incorporate large numbers of auxiliary demonstrations alongside
the expert data. However, the performance of these approaches rely on
assumptions about the quality and composition of the auxiliary data. However,
they are rarely successful when those assumptions do not hold. To address this
limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data
(ROIDA). ROIDA first identifies high-quality transitions from the entire
auxiliary dataset using a learned reward function. These high-reward samples
are combined with the expert demonstrations for weighted behavioral cloning.
For lower-quality samples, ROIDA applies temporal difference learning to steer
the policy towards high-reward states, improving long-term returns. This
two-pronged approach enables our framework to effectively leverage both high
and low-quality data without any assumptions. Extensive experiments validate
that ROIDA achieves robust and consistent performance across multiple auxiliary
datasets with diverse ratios of expert and non-expert demonstrations. ROIDA
effectively leverages unlabeled auxiliary data, outperforming prior methods
reliant on specific data assumptions.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBibGZlxMtgG5SeVFpco-Kel5WTmpSp45maWJJZk5ucp-KQmFuVl5qUrpBXl5yq4ZJalFhWnKjiWVmTmZCYWVSq4JJYk8jCwpiXmFKfyQmluBnk31xBnD12wRfEFRZm5QKXxIAvjwRYaE1YBADL_NXE</recordid><startdate>20241004</startdate><enddate>20241004</enddate><creator>Ghosh, Udita</creator><creator>Raychaudhuri, Dripta S</creator><creator>Li, Jiachen</creator><creator>Karydis, Konstantinos</creator><creator>Roy-Chowdhury, Amit K</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241004</creationdate><title>Robust Offline Imitation Learning from Diverse Auxiliary Data</title><author>Ghosh, Udita ; Raychaudhuri, Dripta S ; Li, Jiachen ; Karydis, Konstantinos ; Roy-Chowdhury, Amit K</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_036263</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Ghosh, Udita</creatorcontrib><creatorcontrib>Raychaudhuri, Dripta S</creatorcontrib><creatorcontrib>Li, Jiachen</creatorcontrib><creatorcontrib>Karydis, Konstantinos</creatorcontrib><creatorcontrib>Roy-Chowdhury, Amit K</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ghosh, Udita</au><au>Raychaudhuri, Dripta S</au><au>Li, Jiachen</au><au>Karydis, Konstantinos</au><au>Roy-Chowdhury, Amit K</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Offline Imitation Learning from Diverse Auxiliary Data</atitle><date>2024-10-04</date><risdate>2024</risdate><abstract>Offline imitation learning enables learning a policy solely from a set of
expert demonstrations, without any environment interaction. To alleviate the
issue of distribution shift arising due to the small amount of expert data,
recent works incorporate large numbers of auxiliary demonstrations alongside
the expert data. However, the performance of these approaches rely on
assumptions about the quality and composition of the auxiliary data. However,
they are rarely successful when those assumptions do not hold. To address this
limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data
(ROIDA). ROIDA first identifies high-quality transitions from the entire
auxiliary dataset using a learned reward function. These high-reward samples
are combined with the expert demonstrations for weighted behavioral cloning.
For lower-quality samples, ROIDA applies temporal difference learning to steer
the policy towards high-reward states, improving long-term returns. This
two-pronged approach enables our framework to effectively leverage both high
and low-quality data without any assumptions. Extensive experiments validate
that ROIDA achieves robust and consistent performance across multiple auxiliary
datasets with diverse ratios of expert and non-expert demonstrations. ROIDA
effectively leverages unlabeled auxiliary data, outperforming prior methods
reliant on specific data assumptions.</abstract><doi>10.48550/arxiv.2410.03626</doi><oa>free_for_read</oa></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | DOI: 10.48550/arxiv.2410.03626 |
ispartof | |
issn | |
language | eng |
recordid | cdi_arxiv_primary_2410_03626 |
source | arXiv.org |
subjects | Computer Science - Learning |
title | Robust Offline Imitation Learning from Diverse Auxiliary Data |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T09%3A58%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Offline%20Imitation%20Learning%20from%20Diverse%20Auxiliary%20Data&rft.au=Ghosh,%20Udita&rft.date=2024-10-04&rft_id=info:doi/10.48550/arxiv.2410.03626&rft_dat=%3Carxiv_GOX%3E2410_03626%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |