Robust Offline Imitation Learning from Diverse Auxiliary Data

Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongs...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Ghosh, Udita, Raychaudhuri, Dripta S, Li, Jiachen, Karydis, Konstantinos, Roy-Chowdhury, Amit K
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title
container_volume
creator Ghosh, Udita
Raychaudhuri, Dripta S
Li, Jiachen
Karydis, Konstantinos
Roy-Chowdhury, Amit K
description Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongside the expert data. However, the performance of these approaches rely on assumptions about the quality and composition of the auxiliary data. However, they are rarely successful when those assumptions do not hold. To address this limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data (ROIDA). ROIDA first identifies high-quality transitions from the entire auxiliary dataset using a learned reward function. These high-reward samples are combined with the expert demonstrations for weighted behavioral cloning. For lower-quality samples, ROIDA applies temporal difference learning to steer the policy towards high-reward states, improving long-term returns. This two-pronged approach enables our framework to effectively leverage both high and low-quality data without any assumptions. Extensive experiments validate that ROIDA achieves robust and consistent performance across multiple auxiliary datasets with diverse ratios of expert and non-expert demonstrations. ROIDA effectively leverages unlabeled auxiliary data, outperforming prior methods reliant on specific data assumptions.
doi_str_mv 10.48550/arxiv.2410.03626
format Article
fullrecord <record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_03626</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_03626</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_036263</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBibGZlxMtgG5SeVFpco-Kel5WTmpSp45maWJJZk5ucp-KQmFuVl5qUrpBXl5yq4ZJalFhWnKjiWVmTmZCYWVSq4JJYk8jCwpiXmFKfyQmluBnk31xBnD12wRfEFRZm5QKXxIAvjwRYaE1YBADL_NXE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Robust Offline Imitation Learning from Diverse Auxiliary Data</title><source>arXiv.org</source><creator>Ghosh, Udita ; Raychaudhuri, Dripta S ; Li, Jiachen ; Karydis, Konstantinos ; Roy-Chowdhury, Amit K</creator><creatorcontrib>Ghosh, Udita ; Raychaudhuri, Dripta S ; Li, Jiachen ; Karydis, Konstantinos ; Roy-Chowdhury, Amit K</creatorcontrib><description>Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongside the expert data. However, the performance of these approaches rely on assumptions about the quality and composition of the auxiliary data. However, they are rarely successful when those assumptions do not hold. To address this limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data (ROIDA). ROIDA first identifies high-quality transitions from the entire auxiliary dataset using a learned reward function. These high-reward samples are combined with the expert demonstrations for weighted behavioral cloning. For lower-quality samples, ROIDA applies temporal difference learning to steer the policy towards high-reward states, improving long-term returns. This two-pronged approach enables our framework to effectively leverage both high and low-quality data without any assumptions. Extensive experiments validate that ROIDA achieves robust and consistent performance across multiple auxiliary datasets with diverse ratios of expert and non-expert demonstrations. ROIDA effectively leverages unlabeled auxiliary data, outperforming prior methods reliant on specific data assumptions.</description><identifier>DOI: 10.48550/arxiv.2410.03626</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2024-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.03626$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.03626$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ghosh, Udita</creatorcontrib><creatorcontrib>Raychaudhuri, Dripta S</creatorcontrib><creatorcontrib>Li, Jiachen</creatorcontrib><creatorcontrib>Karydis, Konstantinos</creatorcontrib><creatorcontrib>Roy-Chowdhury, Amit K</creatorcontrib><title>Robust Offline Imitation Learning from Diverse Auxiliary Data</title><description>Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongside the expert data. However, the performance of these approaches rely on assumptions about the quality and composition of the auxiliary data. However, they are rarely successful when those assumptions do not hold. To address this limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data (ROIDA). ROIDA first identifies high-quality transitions from the entire auxiliary dataset using a learned reward function. These high-reward samples are combined with the expert demonstrations for weighted behavioral cloning. For lower-quality samples, ROIDA applies temporal difference learning to steer the policy towards high-reward states, improving long-term returns. This two-pronged approach enables our framework to effectively leverage both high and low-quality data without any assumptions. Extensive experiments validate that ROIDA achieves robust and consistent performance across multiple auxiliary datasets with diverse ratios of expert and non-expert demonstrations. ROIDA effectively leverages unlabeled auxiliary data, outperforming prior methods reliant on specific data assumptions.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBibGZlxMtgG5SeVFpco-Kel5WTmpSp45maWJJZk5ucp-KQmFuVl5qUrpBXl5yq4ZJalFhWnKjiWVmTmZCYWVSq4JJYk8jCwpiXmFKfyQmluBnk31xBnD12wRfEFRZm5QKXxIAvjwRYaE1YBADL_NXE</recordid><startdate>20241004</startdate><enddate>20241004</enddate><creator>Ghosh, Udita</creator><creator>Raychaudhuri, Dripta S</creator><creator>Li, Jiachen</creator><creator>Karydis, Konstantinos</creator><creator>Roy-Chowdhury, Amit K</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241004</creationdate><title>Robust Offline Imitation Learning from Diverse Auxiliary Data</title><author>Ghosh, Udita ; Raychaudhuri, Dripta S ; Li, Jiachen ; Karydis, Konstantinos ; Roy-Chowdhury, Amit K</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_036263</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Ghosh, Udita</creatorcontrib><creatorcontrib>Raychaudhuri, Dripta S</creatorcontrib><creatorcontrib>Li, Jiachen</creatorcontrib><creatorcontrib>Karydis, Konstantinos</creatorcontrib><creatorcontrib>Roy-Chowdhury, Amit K</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ghosh, Udita</au><au>Raychaudhuri, Dripta S</au><au>Li, Jiachen</au><au>Karydis, Konstantinos</au><au>Roy-Chowdhury, Amit K</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Offline Imitation Learning from Diverse Auxiliary Data</atitle><date>2024-10-04</date><risdate>2024</risdate><abstract>Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongside the expert data. However, the performance of these approaches rely on assumptions about the quality and composition of the auxiliary data. However, they are rarely successful when those assumptions do not hold. To address this limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data (ROIDA). ROIDA first identifies high-quality transitions from the entire auxiliary dataset using a learned reward function. These high-reward samples are combined with the expert demonstrations for weighted behavioral cloning. For lower-quality samples, ROIDA applies temporal difference learning to steer the policy towards high-reward states, improving long-term returns. This two-pronged approach enables our framework to effectively leverage both high and low-quality data without any assumptions. Extensive experiments validate that ROIDA achieves robust and consistent performance across multiple auxiliary datasets with diverse ratios of expert and non-expert demonstrations. ROIDA effectively leverages unlabeled auxiliary data, outperforming prior methods reliant on specific data assumptions.</abstract><doi>10.48550/arxiv.2410.03626</doi><oa>free_for_read</oa></addata></record>
fulltext fulltext_linktorsrc
identifier DOI: 10.48550/arxiv.2410.03626
ispartof
issn
language eng
recordid cdi_arxiv_primary_2410_03626
source arXiv.org
subjects Computer Science - Learning
title Robust Offline Imitation Learning from Diverse Auxiliary Data
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T09%3A58%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Offline%20Imitation%20Learning%20from%20Diverse%20Auxiliary%20Data&rft.au=Ghosh,%20Udita&rft.date=2024-10-04&rft_id=info:doi/10.48550/arxiv.2410.03626&rft_dat=%3Carxiv_GOX%3E2410_03626%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true