Domain Randomization via Entropy Maximization

Varying dynamics parameters in simulation is a popular Domain Randomization (DR) approach for overcoming the reality gap in Reinforcement Learning (RL). Nevertheless, DR heavily hinges on the choice of the sampling distribution of the dynamics parameters, since high variability is crucial to regular...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Tiboni, Gabriele, Klink, Pascal, Peters, Jan, Tommasi, Tatiana, D'Eramo, Carlo, Chalvatzaki, Georgia
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Computer Science - Robotics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Tiboni, Gabriele Klink, Pascal Peters, Jan Tommasi, Tatiana D'Eramo, Carlo Chalvatzaki, Georgia
description	Varying dynamics parameters in simulation is a popular Domain Randomization (DR) approach for overcoming the reality gap in Reinforcement Learning (RL). Nevertheless, DR heavily hinges on the choice of the sampling distribution of the dynamics parameters, since high variability is crucial to regularize the agent's behavior but notoriously leads to overly conservative policies when randomizing excessively. In this paper, we propose a novel approach to address sim-to-real transfer, which automatically shapes dynamics distributions during training in simulation without requiring real-world data. We introduce DOmain RAndomization via Entropy MaximizatiON (DORAEMON), a constrained optimization problem that directly maximizes the entropy of the training distribution while retaining generalization capabilities. In achieving this, DORAEMON gradually increases the diversity of sampled dynamics parameters as long as the probability of success of the current policy is sufficiently high. We empirically validate the consistent benefits of DORAEMON in obtaining highly adaptive and generalizable policies, i.e. solving the task at hand across the widest range of dynamics parameters, as opposed to representative baselines from the DR literature. Notably, we also demonstrate the Sim2Real applicability of DORAEMON through its successful zero-shot transfer in a robotic manipulation setup under unknown real-world parameters.
doi_str_mv	10.48550/arxiv.2311.01885
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2311_01885</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2311_01885</sourcerecordid><originalsourceid>FETCH-LOGICAL-a675-53e39a917c89820836083964b638c01cf4423906155b5bfa8954dc92f410fa8f3</originalsourceid><addsrcrecordid>eNo1jssKwjAURLNxIeoHuLI_0Jo0uWmylPqEiiDdl9tHIGAf1CLWr7dWXQzDcGA4hCwZ9YQCoGtsn_bh-ZwxjzKlYErcbV2irZwrVnld2hd2tq6ch0VnV3Vt3fTOGZ_2D-ZkYvB2Lxa_npF4v4vDoxtdDqdwE7koA3CBF1yjZkGmtPKp4nKIliKVXGWUZUYIn2sqGUAKqUGlQeSZ9o1gdFiGz8jqezv6Jk1rS2z75OOdjN78DRqkO9o</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Domain Randomization via Entropy Maximization</title><source>arXiv.org</source><creator>Tiboni, Gabriele ; Klink, Pascal ; Peters, Jan ; Tommasi, Tatiana ; D'Eramo, Carlo ; Chalvatzaki, Georgia</creator><creatorcontrib>Tiboni, Gabriele ; Klink, Pascal ; Peters, Jan ; Tommasi, Tatiana ; D'Eramo, Carlo ; Chalvatzaki, Georgia</creatorcontrib><description>Varying dynamics parameters in simulation is a popular Domain Randomization (DR) approach for overcoming the reality gap in Reinforcement Learning (RL). Nevertheless, DR heavily hinges on the choice of the sampling distribution of the dynamics parameters, since high variability is crucial to regularize the agent's behavior but notoriously leads to overly conservative policies when randomizing excessively. In this paper, we propose a novel approach to address sim-to-real transfer, which automatically shapes dynamics distributions during training in simulation without requiring real-world data. We introduce DOmain RAndomization via Entropy MaximizatiON (DORAEMON), a constrained optimization problem that directly maximizes the entropy of the training distribution while retaining generalization capabilities. In achieving this, DORAEMON gradually increases the diversity of sampled dynamics parameters as long as the probability of success of the current policy is sufficiently high. We empirically validate the consistent benefits of DORAEMON in obtaining highly adaptive and generalizable policies, i.e. solving the task at hand across the widest range of dynamics parameters, as opposed to representative baselines from the DR literature. Notably, we also demonstrate the Sim2Real applicability of DORAEMON through its successful zero-shot transfer in a robotic manipulation setup under unknown real-world parameters.</description><identifier>DOI: 10.48550/arxiv.2311.01885</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Robotics</subject><creationdate>2023-11</creationdate><rights>http://arxiv.org/licenses/nonexclusive-distrib/1.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2311.01885$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2311.01885$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Tiboni, Gabriele</creatorcontrib><creatorcontrib>Klink, Pascal</creatorcontrib><creatorcontrib>Peters, Jan</creatorcontrib><creatorcontrib>Tommasi, Tatiana</creatorcontrib><creatorcontrib>D'Eramo, Carlo</creatorcontrib><creatorcontrib>Chalvatzaki, Georgia</creatorcontrib><title>Domain Randomization via Entropy Maximization</title><description>Varying dynamics parameters in simulation is a popular Domain Randomization (DR) approach for overcoming the reality gap in Reinforcement Learning (RL). Nevertheless, DR heavily hinges on the choice of the sampling distribution of the dynamics parameters, since high variability is crucial to regularize the agent's behavior but notoriously leads to overly conservative policies when randomizing excessively. In this paper, we propose a novel approach to address sim-to-real transfer, which automatically shapes dynamics distributions during training in simulation without requiring real-world data. We introduce DOmain RAndomization via Entropy MaximizatiON (DORAEMON), a constrained optimization problem that directly maximizes the entropy of the training distribution while retaining generalization capabilities. In achieving this, DORAEMON gradually increases the diversity of sampled dynamics parameters as long as the probability of success of the current policy is sufficiently high. We empirically validate the consistent benefits of DORAEMON in obtaining highly adaptive and generalizable policies, i.e. solving the task at hand across the widest range of dynamics parameters, as opposed to representative baselines from the DR literature. Notably, we also demonstrate the Sim2Real applicability of DORAEMON through its successful zero-shot transfer in a robotic manipulation setup under unknown real-world parameters.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2023</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNo1jssKwjAURLNxIeoHuLI_0Jo0uWmylPqEiiDdl9tHIGAf1CLWr7dWXQzDcGA4hCwZ9YQCoGtsn_bh-ZwxjzKlYErcbV2irZwrVnld2hd2tq6ch0VnV3Vt3fTOGZ_2D-ZkYvB2Lxa_npF4v4vDoxtdDqdwE7koA3CBF1yjZkGmtPKp4nKIliKVXGWUZUYIn2sqGUAKqUGlQeSZ9o1gdFiGz8jqezv6Jk1rS2z75OOdjN78DRqkO9o</recordid><startdate>20231103</startdate><enddate>20231103</enddate><creator>Tiboni, Gabriele</creator><creator>Klink, Pascal</creator><creator>Peters, Jan</creator><creator>Tommasi, Tatiana</creator><creator>D'Eramo, Carlo</creator><creator>Chalvatzaki, Georgia</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20231103</creationdate><title>Domain Randomization via Entropy Maximization</title><author>Tiboni, Gabriele ; Klink, Pascal ; Peters, Jan ; Tommasi, Tatiana ; D'Eramo, Carlo ; Chalvatzaki, Georgia</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a675-53e39a917c89820836083964b638c01cf4423906155b5bfa8954dc92f410fa8f3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2023</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Tiboni, Gabriele</creatorcontrib><creatorcontrib>Klink, Pascal</creatorcontrib><creatorcontrib>Peters, Jan</creatorcontrib><creatorcontrib>Tommasi, Tatiana</creatorcontrib><creatorcontrib>D'Eramo, Carlo</creatorcontrib><creatorcontrib>Chalvatzaki, Georgia</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tiboni, Gabriele</au><au>Klink, Pascal</au><au>Peters, Jan</au><au>Tommasi, Tatiana</au><au>D'Eramo, Carlo</au><au>Chalvatzaki, Georgia</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Domain Randomization via Entropy Maximization</atitle><date>2023-11-03</date><risdate>2023</risdate><abstract>Varying dynamics parameters in simulation is a popular Domain Randomization (DR) approach for overcoming the reality gap in Reinforcement Learning (RL). Nevertheless, DR heavily hinges on the choice of the sampling distribution of the dynamics parameters, since high variability is crucial to regularize the agent's behavior but notoriously leads to overly conservative policies when randomizing excessively. In this paper, we propose a novel approach to address sim-to-real transfer, which automatically shapes dynamics distributions during training in simulation without requiring real-world data. We introduce DOmain RAndomization via Entropy MaximizatiON (DORAEMON), a constrained optimization problem that directly maximizes the entropy of the training distribution while retaining generalization capabilities. In achieving this, DORAEMON gradually increases the diversity of sampled dynamics parameters as long as the probability of success of the current policy is sufficiently high. We empirically validate the consistent benefits of DORAEMON in obtaining highly adaptive and generalizable policies, i.e. solving the task at hand across the widest range of dynamics parameters, as opposed to representative baselines from the DR literature. Notably, we also demonstrate the Sim2Real applicability of DORAEMON through its successful zero-shot transfer in a robotic manipulation setup under unknown real-world parameters.</abstract><doi>10.48550/arxiv.2311.01885</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2311.01885
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2311_01885
source	arXiv.org
subjects	Computer Science - Learning Computer Science - Robotics
title	Domain Randomization via Entropy Maximization
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-27T21%3A51%3A47IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Domain%20Randomization%20via%20Entropy%20Maximization&rft.au=Tiboni,%20Gabriele&rft.date=2023-11-03&rft_id=info:doi/10.48550/arxiv.2311.01885&rft_dat=%3Carxiv_GOX%3E2311_01885%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true