Adversarial Environment Design via Regret-Guided Diffusion Models

Training agents that are robust to environmental changes remains a significant challenge in deep reinforcement learning (RL). Unsupervised environment design (UED) has recently emerged to address this issue by generating a set of training environments tailored to the agent's capabilities. While...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Chung, Hojun, Lee, Junseo, Kim, Minsoo, Kim, Dohyeong, Oh, Songhwai
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Chung, Hojun Lee, Junseo Kim, Minsoo Kim, Dohyeong Oh, Songhwai
description	Training agents that are robust to environmental changes remains a significant challenge in deep reinforcement learning (RL). Unsupervised environment design (UED) has recently emerged to address this issue by generating a set of training environments tailored to the agent's capabilities. While prior works demonstrate that UED has the potential to learn a robust policy, their performance is constrained by the capabilities of the environment generation. To this end, we propose a novel UED algorithm, adversarial environment design via regret-guided diffusion models (ADD). The proposed method guides the diffusion-based environment generator with the regret of the agent to produce environments that the agent finds challenging but conducive to further improvement. By exploiting the representation power of diffusion models, ADD can directly generate adversarial environments while maintaining the diversity of training environments, enabling the agent to effectively learn a robust policy. Our experimental results demonstrate that the proposed method successfully generates an instructive curriculum of environments, outperforming UED baselines in zero-shot generalization across novel, out-of-distribution environments. Project page: https://rllab-snu.github.io/projects/ADD
doi_str_mv	10.48550/arxiv.2410.19715
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2410_19715</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2410_19715</sourcerecordid><originalsourceid>FETCH-arxiv_primary_2410_197153</originalsourceid><addsrcrecordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGFqaG5pyMjg6ppSlFhUnFmUm5ii45pVlFuXn5abmlSi4pBZnpucplGUmKgSlphellui6l2ampKYouGSmpZUWZ-bnKfjmp6TmFPMwsKYl5hSn8kJpbgZ5N9cQZw9dsGXxBUWZuYlFlfEgS-PBlhoTVgEAP2o3Hw</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Adversarial Environment Design via Regret-Guided Diffusion Models</title><source>arXiv.org</source><creator>Chung, Hojun ; Lee, Junseo ; Kim, Minsoo ; Kim, Dohyeong ; Oh, Songhwai</creator><creatorcontrib>Chung, Hojun ; Lee, Junseo ; Kim, Minsoo ; Kim, Dohyeong ; Oh, Songhwai</creatorcontrib><description>Training agents that are robust to environmental changes remains a significant challenge in deep reinforcement learning (RL). Unsupervised environment design (UED) has recently emerged to address this issue by generating a set of training environments tailored to the agent's capabilities. While prior works demonstrate that UED has the potential to learn a robust policy, their performance is constrained by the capabilities of the environment generation. To this end, we propose a novel UED algorithm, adversarial environment design via regret-guided diffusion models (ADD). The proposed method guides the diffusion-based environment generator with the regret of the agent to produce environments that the agent finds challenging but conducive to further improvement. By exploiting the representation power of diffusion models, ADD can directly generate adversarial environments while maintaining the diversity of training environments, enabling the agent to effectively learn a robust policy. Our experimental results demonstrate that the proposed method successfully generates an instructive curriculum of environments, outperforming UED baselines in zero-shot generalization across novel, out-of-distribution environments. Project page: https://rllab-snu.github.io/projects/ADD</description><identifier>DOI: 10.48550/arxiv.2410.19715</identifier><language>eng</language><subject>Computer Science - Artificial Intelligence ; Computer Science - Learning</subject><creationdate>2024-10</creationdate><rights>http://creativecommons.org/licenses/by-nc-sa/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,781,886</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2410.19715$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2410.19715$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Chung, Hojun</creatorcontrib><creatorcontrib>Lee, Junseo</creatorcontrib><creatorcontrib>Kim, Minsoo</creatorcontrib><creatorcontrib>Kim, Dohyeong</creatorcontrib><creatorcontrib>Oh, Songhwai</creatorcontrib><title>Adversarial Environment Design via Regret-Guided Diffusion Models</title><description>Training agents that are robust to environmental changes remains a significant challenge in deep reinforcement learning (RL). Unsupervised environment design (UED) has recently emerged to address this issue by generating a set of training environments tailored to the agent's capabilities. While prior works demonstrate that UED has the potential to learn a robust policy, their performance is constrained by the capabilities of the environment generation. To this end, we propose a novel UED algorithm, adversarial environment design via regret-guided diffusion models (ADD). The proposed method guides the diffusion-based environment generator with the regret of the agent to produce environments that the agent finds challenging but conducive to further improvement. By exploiting the representation power of diffusion models, ADD can directly generate adversarial environments while maintaining the diversity of training environments, enabling the agent to effectively learn a robust policy. Our experimental results demonstrate that the proposed method successfully generates an instructive curriculum of environments, outperforming UED baselines in zero-shot generalization across novel, out-of-distribution environments. Project page: https://rllab-snu.github.io/projects/ADD</description><subject>Computer Science - Artificial Intelligence</subject><subject>Computer Science - Learning</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpjYJA0NNAzsTA1NdBPLKrILNMzMgEKGFqaG5pyMjg6ppSlFhUnFmUm5ii45pVlFuXn5abmlSi4pBZnpucplGUmKgSlphellui6l2ampKYouGSmpZUWZ-bnKfjmp6TmFPMwsKYl5hSn8kJpbgZ5N9cQZw9dsGXxBUWZuYlFlfEgS-PBlhoTVgEAP2o3Hw</recordid><startdate>20241025</startdate><enddate>20241025</enddate><creator>Chung, Hojun</creator><creator>Lee, Junseo</creator><creator>Kim, Minsoo</creator><creator>Kim, Dohyeong</creator><creator>Oh, Songhwai</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20241025</creationdate><title>Adversarial Environment Design via Regret-Guided Diffusion Models</title><author>Chung, Hojun ; Lee, Junseo ; Kim, Minsoo ; Kim, Dohyeong ; Oh, Songhwai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-arxiv_primary_2410_197153</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Artificial Intelligence</topic><topic>Computer Science - Learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Chung, Hojun</creatorcontrib><creatorcontrib>Lee, Junseo</creatorcontrib><creatorcontrib>Kim, Minsoo</creatorcontrib><creatorcontrib>Kim, Dohyeong</creatorcontrib><creatorcontrib>Oh, Songhwai</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Chung, Hojun</au><au>Lee, Junseo</au><au>Kim, Minsoo</au><au>Kim, Dohyeong</au><au>Oh, Songhwai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Adversarial Environment Design via Regret-Guided Diffusion Models</atitle><date>2024-10-25</date><risdate>2024</risdate><abstract>Training agents that are robust to environmental changes remains a significant challenge in deep reinforcement learning (RL). Unsupervised environment design (UED) has recently emerged to address this issue by generating a set of training environments tailored to the agent's capabilities. While prior works demonstrate that UED has the potential to learn a robust policy, their performance is constrained by the capabilities of the environment generation. To this end, we propose a novel UED algorithm, adversarial environment design via regret-guided diffusion models (ADD). The proposed method guides the diffusion-based environment generator with the regret of the agent to produce environments that the agent finds challenging but conducive to further improvement. By exploiting the representation power of diffusion models, ADD can directly generate adversarial environments while maintaining the diversity of training environments, enabling the agent to effectively learn a robust policy. Our experimental results demonstrate that the proposed method successfully generates an instructive curriculum of environments, outperforming UED baselines in zero-shot generalization across novel, out-of-distribution environments. Project page: https://rllab-snu.github.io/projects/ADD</abstract><doi>10.48550/arxiv.2410.19715</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2410.19715
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2410_19715
source	arXiv.org
subjects	Computer Science - Artificial Intelligence Computer Science - Learning
title	Adversarial Environment Design via Regret-Guided Diffusion Models
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-16T19%3A31%3A41IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Adversarial%20Environment%20Design%20via%20Regret-Guided%20Diffusion%20Models&rft.au=Chung,%20Hojun&rft.date=2024-10-25&rft_id=info:doi/10.48550/arxiv.2410.19715&rft_dat=%3Carxiv_GOX%3E2410_19715%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true