Robust Offline Imitation Learning from Diverse Auxiliary Data

Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongs...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Ghosh, Udita, Raychaudhuri, Dripta S, Li, Jiachen, Karydis, Konstantinos, Roy-Chowdhury, Amit K
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Ghosh, Udita Raychaudhuri, Dripta S Li, Jiachen Karydis, Konstantinos Roy-Chowdhury, Amit K
description	Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongside the expert data. However, the performance of these approaches rely on assumptions about the quality and composition of the auxiliary data. However, they are rarely successful when those assumptions do not hold. To address this limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data (ROIDA). ROIDA first identifies high-quality transitions from the entire auxiliary dataset using a learned reward function. These high-reward samples are combined with the expert demonstrations for weighted behavioral cloning. For lower-quality samples, ROIDA applies temporal difference learning to steer the policy towards high-reward states, improving long-term returns. This two-pronged approach enables our framework to effectively leverage both high and low-quality data without any assumptions. Extensive experiments validate that ROIDA achieves robust and consistent performance across multiple auxiliary datasets with diverse ratios of expert and non-expert demonstrations. ROIDA effectively leverages unlabeled auxiliary data, outperforming prior methods reliant on specific data assumptions.
doi_str_mv	10.48550/arxiv.2410.03626
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_03626</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_03626</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_036263</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBibGZlxMtgG5SeVFpco-Kel5WTmpSp45maWJJZk5ucp-KQmFuVl5qUrpBXl5yq4ZJalFhWnKjiWVmTmZCYWVSq4JJYk8jCwpiXmFKfyQmluBnk31xBnD12wRfEFRZm5QKXxIAvjwRYaE1YBADL_NXE</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Robust Offline Imitation Learning from Diverse Auxiliary Data</title><source>arXiv.org</source><creator>Ghosh, Udita ; Raychaudhuri, Dripta S ; Li, Jiachen ; Karydis, Konstantinos ; Roy-Chowdhury, Amit K</creator><creatorcontrib>Ghosh, Udita ; Raychaudhuri, Dripta S ; Li, Jiachen ; Karydis, Konstantinos ; Roy-Chowdhury, Amit K</creatorcontrib><description>Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongside the expert data. However, the performance of these approaches rely on assumptions about the quality and composition of the auxiliary data. However, they are rarely successful when those assumptions do not hold. To address this limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data (ROIDA). ROIDA first identifies high-quality transitions from the entire auxiliary dataset using a learned reward function. These high-reward samples are combined with the expert demonstrations for weighted behavioral cloning. For lower-quality samples, ROIDA applies temporal difference learning to steer the policy towards high-reward states, improving long-term returns. This two-pronged approach enables our framework to effectively leverage both high and low-quality data without any assumptions. Extensive experiments validate that ROIDA achieves robust and consistent performance across multiple auxiliary datasets with diverse ratios of expert and non-expert demonstrations. ROIDA effectively leverages unlabeled auxiliary data, outperforming prior methods reliant on specific data assumptions.</description><identifier>DOI: 10.48550/arxiv.2410.03626</identifier><language>eng</language><subject>Computer Science - Learning</subject><creationdate>2024-10</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,778,883</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.03626$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.03626$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Ghosh, Udita</creatorcontrib><creatorcontrib>Raychaudhuri, Dripta S</creatorcontrib><creatorcontrib>Li, Jiachen</creatorcontrib><creatorcontrib>Karydis, Konstantinos</creatorcontrib><creatorcontrib>Roy-Chowdhury, Amit K</creatorcontrib><title>Robust Offline Imitation Learning from Diverse Auxiliary Data</title><description>Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongside the expert data. However, the performance of these approaches rely on assumptions about the quality and composition of the auxiliary data. However, they are rarely successful when those assumptions do not hold. To address this limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data (ROIDA). ROIDA first identifies high-quality transitions from the entire auxiliary dataset using a learned reward function. These high-reward samples are combined with the expert demonstrations for weighted behavioral cloning. For lower-quality samples, ROIDA applies temporal difference learning to steer the policy towards high-reward states, improving long-term returns. This two-pronged approach enables our framework to effectively leverage both high and low-quality data without any assumptions. Extensive experiments validate that ROIDA achieves robust and consistent performance across multiple auxiliary datasets with diverse ratios of expert and non-expert demonstrations. ROIDA effectively leverages unlabeled auxiliary data, outperforming prior methods reliant on specific data assumptions.</description><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGBibGZlxMtgG5SeVFpco-Kel5WTmpSp45maWJJZk5ucp-KQmFuVl5qUrpBXl5yq4ZJalFhWnKjiWVmTmZCYWVSq4JJYk8jCwpiXmFKfyQmluBnk31xBnD12wRfEFRZm5QKXxIAvjwRYaE1YBADL_NXE</recordid><startdate>20241004</startdate><enddate>20241004</enddate><creator>Ghosh, Udita</creator><creator>Raychaudhuri, Dripta S</creator><creator>Li, Jiachen</creator><creator>Karydis, Konstantinos</creator><creator>Roy-Chowdhury, Amit K</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241004</creationdate><title>Robust Offline Imitation Learning from Diverse Auxiliary Data</title><author>Ghosh, Udita ; Raychaudhuri, Dripta S ; Li, Jiachen ; Karydis, Konstantinos ; Roy-Chowdhury, Amit K</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_036263</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Ghosh, Udita</creatorcontrib><creatorcontrib>Raychaudhuri, Dripta S</creatorcontrib><creatorcontrib>Li, Jiachen</creatorcontrib><creatorcontrib>Karydis, Konstantinos</creatorcontrib><creatorcontrib>Roy-Chowdhury, Amit K</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Ghosh, Udita</au><au>Raychaudhuri, Dripta S</au><au>Li, Jiachen</au><au>Karydis, Konstantinos</au><au>Roy-Chowdhury, Amit K</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Robust Offline Imitation Learning from Diverse Auxiliary Data</atitle><date>2024-10-04</date><risdate>2024</risdate><abstract>Offline imitation learning enables learning a policy solely from a set of expert demonstrations, without any environment interaction. To alleviate the issue of distribution shift arising due to the small amount of expert data, recent works incorporate large numbers of auxiliary demonstrations alongside the expert data. However, the performance of these approaches rely on assumptions about the quality and composition of the auxiliary data. However, they are rarely successful when those assumptions do not hold. To address this limitation, we propose Robust Offline Imitation from Diverse Auxiliary Data (ROIDA). ROIDA first identifies high-quality transitions from the entire auxiliary dataset using a learned reward function. These high-reward samples are combined with the expert demonstrations for weighted behavioral cloning. For lower-quality samples, ROIDA applies temporal difference learning to steer the policy towards high-reward states, improving long-term returns. This two-pronged approach enables our framework to effectively leverage both high and low-quality data without any assumptions. Extensive experiments validate that ROIDA achieves robust and consistent performance across multiple auxiliary datasets with diverse ratios of expert and non-expert demonstrations. ROIDA effectively leverages unlabeled auxiliary data, outperforming prior methods reliant on specific data assumptions.</abstract><doi>10.48550/arxiv.2410.03626</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2410.03626
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2410_03626
source	arXiv.org
subjects	Computer Science - Learning
title	Robust Offline Imitation Learning from Diverse Auxiliary Data
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-16T09%3A58%3A45IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Robust%20Offline%20Imitation%20Learning%20from%20Diverse%20Auxiliary%20Data&rft.au=Ghosh,%20Udita&rft.date=2024-10-04&rft_id=info:doi/10.48550/arxiv.2410.03626&rft_dat=%3Carxiv_GOX%3E2410_03626%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true