A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minute...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on robotics 2020-04, Vol.36 (2), p.328-347
Hauptverfasser:	Chatzilygeroudis, Konstantinos, Vassiliades, Vassilis, Stulp, Freek, Calinon, Sylvain, Mouret, Jean-Baptiste
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial Intelligence Automatic Autonomous agents Biological system modeling Computer Science Computer simulation Computing time Data models Dynamic models Engineering Sciences Europe Heuristic algorithms learning and adaptive systems Machine learning micro-data policy search (MDPS) Optimization Robot kinematics robot learning Robotics Robots Search algorithms Simulators Trajectory
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	347
container_issue	2
container_start_page	328
container_title	IEEE transactions on robotics
container_volume	36
creator	Chatzilygeroudis, Konstantinos Vassiliades, Vassilis Stulp, Freek Calinon, Sylvain Mouret, Jean-Baptiste
description	Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data," we refer to this challenge as "micro-data reinforcement learning." In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.
doi_str_mv	10.1109/TRO.2019.2958211
format	Article
fullrecord	<record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_02393432v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8944013</ieee_id><sourcerecordid>2387070954</sourcerecordid><originalsourceid>FETCH-LOGICAL-c480t-87f2160ecab4d7aacda3a6297bd0e1231c0349e7df3b8cd43cf4322d24b328c13</originalsourceid><addsrcrecordid>eNo9kM9LwzAcxYsoOKd3wUvAk4fOb350TY5lqBMGk21evIQ0TbaOrtGkHey_t6Vjp_fl8XmPLy-KHjFMMAbxulktJwSwmBCRcILxVTTCguEY2JRfd3eSkJiC4LfRXQh7AMIE0FH0k6F164_mhFyNvlxV6hNaG-X1DmXV1vmy2R0Css6jRefWZb1FK5e7Bs1c3XhXVcYHVNZIobmqC9tWyFm08aWqwn10YzsxD2cdR9_vb5vZPF4sPz5n2SLWjEMT89QSPAWjVc6KVCldKKqmRKR5AQYTijVQJkxaWJpzXTCqLaOEFITllHCN6Th6GXp3qpK_vjwof5JOlXKeLWTvAaGCdpljzz4P7K93f60Jjdy71tfde5JQnkIKImEdBQOlvQvBG3upxSD7tWW3tuzXlue1u8jTECmNMRecC8YAU_oPtWV5yA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2387070954</pqid></control><display><type>article</type><title>A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials</title><source>IEEE Electronic Library (IEL)</source><creator>Chatzilygeroudis, Konstantinos ; Vassiliades, Vassilis ; Stulp, Freek ; Calinon, Sylvain ; Mouret, Jean-Baptiste</creator><creatorcontrib>Chatzilygeroudis, Konstantinos ; Vassiliades, Vassilis ; Stulp, Freek ; Calinon, Sylvain ; Mouret, Jean-Baptiste</creatorcontrib><description>Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data," we refer to this challenge as "micro-data reinforcement learning." In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.</description><identifier>ISSN: 1552-3098</identifier><identifier>EISSN: 1941-0468</identifier><identifier>DOI: 10.1109/TRO.2019.2958211</identifier><identifier>CODEN: ITREAE</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Artificial Intelligence ; Automatic ; Autonomous agents ; Biological system modeling ; Computer Science ; Computer simulation ; Computing time ; Data models ; Dynamic models ; Engineering Sciences ; Europe ; Heuristic algorithms ; learning and adaptive systems ; Machine learning ; micro-data policy search (MDPS) ; Optimization ; Robot kinematics ; robot learning ; Robotics ; Robots ; Search algorithms ; Simulators ; Trajectory</subject><ispartof>IEEE transactions on robotics, 2020-04, Vol.36 (2), p.328-347</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c480t-87f2160ecab4d7aacda3a6297bd0e1231c0349e7df3b8cd43cf4322d24b328c13</citedby><cites>FETCH-LOGICAL-c480t-87f2160ecab4d7aacda3a6297bd0e1231c0349e7df3b8cd43cf4322d24b328c13</cites><orcidid>0000-0001-9555-9517 ; 0000-0002-2513-027X ; 0000-0003-3585-1027 ; 0000-0002-9036-6799 ; 0000-0002-1336-5629</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8944013$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>230,314,776,780,792,881,27901,27902,54733</link.rule.ids><backlink>$$Uhttps://inria.hal.science/hal-02393432$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Chatzilygeroudis, Konstantinos</creatorcontrib><creatorcontrib>Vassiliades, Vassilis</creatorcontrib><creatorcontrib>Stulp, Freek</creatorcontrib><creatorcontrib>Calinon, Sylvain</creatorcontrib><creatorcontrib>Mouret, Jean-Baptiste</creatorcontrib><title>A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials</title><title>IEEE transactions on robotics</title><addtitle>TRO</addtitle><description>Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data," we refer to this challenge as "micro-data reinforcement learning." In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Automatic</subject><subject>Autonomous agents</subject><subject>Biological system modeling</subject><subject>Computer Science</subject><subject>Computer simulation</subject><subject>Computing time</subject><subject>Data models</subject><subject>Dynamic models</subject><subject>Engineering Sciences</subject><subject>Europe</subject><subject>Heuristic algorithms</subject><subject>learning and adaptive systems</subject><subject>Machine learning</subject><subject>micro-data policy search (MDPS)</subject><subject>Optimization</subject><subject>Robot kinematics</subject><subject>robot learning</subject><subject>Robotics</subject><subject>Robots</subject><subject>Search algorithms</subject><subject>Simulators</subject><subject>Trajectory</subject><issn>1552-3098</issn><issn>1941-0468</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><recordid>eNo9kM9LwzAcxYsoOKd3wUvAk4fOb350TY5lqBMGk21evIQ0TbaOrtGkHey_t6Vjp_fl8XmPLy-KHjFMMAbxulktJwSwmBCRcILxVTTCguEY2JRfd3eSkJiC4LfRXQh7AMIE0FH0k6F164_mhFyNvlxV6hNaG-X1DmXV1vmy2R0Css6jRefWZb1FK5e7Bs1c3XhXVcYHVNZIobmqC9tWyFm08aWqwn10YzsxD2cdR9_vb5vZPF4sPz5n2SLWjEMT89QSPAWjVc6KVCldKKqmRKR5AQYTijVQJkxaWJpzXTCqLaOEFITllHCN6Th6GXp3qpK_vjwof5JOlXKeLWTvAaGCdpljzz4P7K93f60Jjdy71tfde5JQnkIKImEdBQOlvQvBG3upxSD7tWW3tuzXlue1u8jTECmNMRecC8YAU_oPtWV5yA</recordid><startdate>20200401</startdate><enddate>20200401</enddate><creator>Chatzilygeroudis, Konstantinos</creator><creator>Vassiliades, Vassilis</creator><creator>Stulp, Freek</creator><creator>Calinon, Sylvain</creator><creator>Mouret, Jean-Baptiste</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0001-9555-9517</orcidid><orcidid>https://orcid.org/0000-0002-2513-027X</orcidid><orcidid>https://orcid.org/0000-0003-3585-1027</orcidid><orcidid>https://orcid.org/0000-0002-9036-6799</orcidid><orcidid>https://orcid.org/0000-0002-1336-5629</orcidid></search><sort><creationdate>20200401</creationdate><title>A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials</title><author>Chatzilygeroudis, Konstantinos ; Vassiliades, Vassilis ; Stulp, Freek ; Calinon, Sylvain ; Mouret, Jean-Baptiste</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c480t-87f2160ecab4d7aacda3a6297bd0e1231c0349e7df3b8cd43cf4322d24b328c13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Automatic</topic><topic>Autonomous agents</topic><topic>Biological system modeling</topic><topic>Computer Science</topic><topic>Computer simulation</topic><topic>Computing time</topic><topic>Data models</topic><topic>Dynamic models</topic><topic>Engineering Sciences</topic><topic>Europe</topic><topic>Heuristic algorithms</topic><topic>learning and adaptive systems</topic><topic>Machine learning</topic><topic>micro-data policy search (MDPS)</topic><topic>Optimization</topic><topic>Robot kinematics</topic><topic>robot learning</topic><topic>Robotics</topic><topic>Robots</topic><topic>Search algorithms</topic><topic>Simulators</topic><topic>Trajectory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chatzilygeroudis, Konstantinos</creatorcontrib><creatorcontrib>Vassiliades, Vassilis</creatorcontrib><creatorcontrib>Stulp, Freek</creatorcontrib><creatorcontrib>Calinon, Sylvain</creatorcontrib><creatorcontrib>Mouret, Jean-Baptiste</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>IEEE transactions on robotics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chatzilygeroudis, Konstantinos</au><au>Vassiliades, Vassilis</au><au>Stulp, Freek</au><au>Calinon, Sylvain</au><au>Mouret, Jean-Baptiste</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials</atitle><jtitle>IEEE transactions on robotics</jtitle><stitle>TRO</stitle><date>2020-04-01</date><risdate>2020</risdate><volume>36</volume><issue>2</issue><spage>328</spage><epage>347</epage><pages>328-347</pages><issn>1552-3098</issn><eissn>1941-0468</eissn><coden>ITREAE</coden><abstract>Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data," we refer to this challenge as "micro-data reinforcement learning." In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TRO.2019.2958211</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0001-9555-9517</orcidid><orcidid>https://orcid.org/0000-0002-2513-027X</orcidid><orcidid>https://orcid.org/0000-0003-3585-1027</orcidid><orcidid>https://orcid.org/0000-0002-9036-6799</orcidid><orcidid>https://orcid.org/0000-0002-1336-5629</orcidid><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	ISSN: 1552-3098
ispartof	IEEE transactions on robotics, 2020-04, Vol.36 (2), p.328-347
issn	1552-3098 1941-0468
language	eng
recordid	cdi_hal_primary_oai_HAL_hal_02393432v1
source	IEEE Electronic Library (IEL)
subjects	Algorithms Artificial Intelligence Automatic Autonomous agents Biological system modeling Computer Science Computer simulation Computing time Data models Dynamic models Engineering Sciences Europe Heuristic algorithms learning and adaptive systems Machine learning micro-data policy search (MDPS) Optimization Robot kinematics robot learning Robotics Robots Search algorithms Simulators Trajectory
title	A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T10%3A58%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Survey%20on%20Policy%20Search%20Algorithms%20for%20Learning%20Robot%20Controllers%20in%20a%20Handful%20of%20Trials&rft.jtitle=IEEE%20transactions%20on%20robotics&rft.au=Chatzilygeroudis,%20Konstantinos&rft.date=2020-04-01&rft.volume=36&rft.issue=2&rft.spage=328&rft.epage=347&rft.pages=328-347&rft.issn=1552-3098&rft.eissn=1941-0468&rft.coden=ITREAE&rft_id=info:doi/10.1109/TRO.2019.2958211&rft_dat=%3Cproquest_hal_p%3E2387070954%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2387070954&rft_id=info:pmid/&rft_ieee_id=8944013&rfr_iscdi=true