A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials

Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minute...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on robotics 2020-04, Vol.36 (2), p.328-347
Hauptverfasser: Chatzilygeroudis, Konstantinos, Vassiliades, Vassilis, Stulp, Freek, Calinon, Sylvain, Mouret, Jean-Baptiste
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 347
container_issue 2
container_start_page 328
container_title IEEE transactions on robotics
container_volume 36
creator Chatzilygeroudis, Konstantinos
Vassiliades, Vassilis
Stulp, Freek
Calinon, Sylvain
Mouret, Jean-Baptiste
description Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data," we refer to this challenge as "micro-data reinforcement learning." In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.
doi_str_mv 10.1109/TRO.2019.2958211
format Article
fullrecord <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_02393432v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8944013</ieee_id><sourcerecordid>2387070954</sourcerecordid><originalsourceid>FETCH-LOGICAL-c480t-87f2160ecab4d7aacda3a6297bd0e1231c0349e7df3b8cd43cf4322d24b328c13</originalsourceid><addsrcrecordid>eNo9kM9LwzAcxYsoOKd3wUvAk4fOb350TY5lqBMGk21evIQ0TbaOrtGkHey_t6Vjp_fl8XmPLy-KHjFMMAbxulktJwSwmBCRcILxVTTCguEY2JRfd3eSkJiC4LfRXQh7AMIE0FH0k6F164_mhFyNvlxV6hNaG-X1DmXV1vmy2R0Css6jRefWZb1FK5e7Bs1c3XhXVcYHVNZIobmqC9tWyFm08aWqwn10YzsxD2cdR9_vb5vZPF4sPz5n2SLWjEMT89QSPAWjVc6KVCldKKqmRKR5AQYTijVQJkxaWJpzXTCqLaOEFITllHCN6Th6GXp3qpK_vjwof5JOlXKeLWTvAaGCdpljzz4P7K93f60Jjdy71tfde5JQnkIKImEdBQOlvQvBG3upxSD7tWW3tuzXlue1u8jTECmNMRecC8YAU_oPtWV5yA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2387070954</pqid></control><display><type>article</type><title>A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials</title><source>IEEE Electronic Library (IEL)</source><creator>Chatzilygeroudis, Konstantinos ; Vassiliades, Vassilis ; Stulp, Freek ; Calinon, Sylvain ; Mouret, Jean-Baptiste</creator><creatorcontrib>Chatzilygeroudis, Konstantinos ; Vassiliades, Vassilis ; Stulp, Freek ; Calinon, Sylvain ; Mouret, Jean-Baptiste</creatorcontrib><description>Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data," we refer to this challenge as "micro-data reinforcement learning." In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.</description><identifier>ISSN: 1552-3098</identifier><identifier>EISSN: 1941-0468</identifier><identifier>DOI: 10.1109/TRO.2019.2958211</identifier><identifier>CODEN: ITREAE</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Artificial Intelligence ; Automatic ; Autonomous agents ; Biological system modeling ; Computer Science ; Computer simulation ; Computing time ; Data models ; Dynamic models ; Engineering Sciences ; Europe ; Heuristic algorithms ; learning and adaptive systems ; Machine learning ; micro-data policy search (MDPS) ; Optimization ; Robot kinematics ; robot learning ; Robotics ; Robots ; Search algorithms ; Simulators ; Trajectory</subject><ispartof>IEEE transactions on robotics, 2020-04, Vol.36 (2), p.328-347</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c480t-87f2160ecab4d7aacda3a6297bd0e1231c0349e7df3b8cd43cf4322d24b328c13</citedby><cites>FETCH-LOGICAL-c480t-87f2160ecab4d7aacda3a6297bd0e1231c0349e7df3b8cd43cf4322d24b328c13</cites><orcidid>0000-0001-9555-9517 ; 0000-0002-2513-027X ; 0000-0003-3585-1027 ; 0000-0002-9036-6799 ; 0000-0002-1336-5629</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8944013$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>230,314,776,780,792,881,27901,27902,54733</link.rule.ids><backlink>$$Uhttps://inria.hal.science/hal-02393432$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Chatzilygeroudis, Konstantinos</creatorcontrib><creatorcontrib>Vassiliades, Vassilis</creatorcontrib><creatorcontrib>Stulp, Freek</creatorcontrib><creatorcontrib>Calinon, Sylvain</creatorcontrib><creatorcontrib>Mouret, Jean-Baptiste</creatorcontrib><title>A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials</title><title>IEEE transactions on robotics</title><addtitle>TRO</addtitle><description>Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data," we refer to this challenge as "micro-data reinforcement learning." In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Automatic</subject><subject>Autonomous agents</subject><subject>Biological system modeling</subject><subject>Computer Science</subject><subject>Computer simulation</subject><subject>Computing time</subject><subject>Data models</subject><subject>Dynamic models</subject><subject>Engineering Sciences</subject><subject>Europe</subject><subject>Heuristic algorithms</subject><subject>learning and adaptive systems</subject><subject>Machine learning</subject><subject>micro-data policy search (MDPS)</subject><subject>Optimization</subject><subject>Robot kinematics</subject><subject>robot learning</subject><subject>Robotics</subject><subject>Robots</subject><subject>Search algorithms</subject><subject>Simulators</subject><subject>Trajectory</subject><issn>1552-3098</issn><issn>1941-0468</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><recordid>eNo9kM9LwzAcxYsoOKd3wUvAk4fOb350TY5lqBMGk21evIQ0TbaOrtGkHey_t6Vjp_fl8XmPLy-KHjFMMAbxulktJwSwmBCRcILxVTTCguEY2JRfd3eSkJiC4LfRXQh7AMIE0FH0k6F164_mhFyNvlxV6hNaG-X1DmXV1vmy2R0Css6jRefWZb1FK5e7Bs1c3XhXVcYHVNZIobmqC9tWyFm08aWqwn10YzsxD2cdR9_vb5vZPF4sPz5n2SLWjEMT89QSPAWjVc6KVCldKKqmRKR5AQYTijVQJkxaWJpzXTCqLaOEFITllHCN6Th6GXp3qpK_vjwof5JOlXKeLWTvAaGCdpljzz4P7K93f60Jjdy71tfde5JQnkIKImEdBQOlvQvBG3upxSD7tWW3tuzXlue1u8jTECmNMRecC8YAU_oPtWV5yA</recordid><startdate>20200401</startdate><enddate>20200401</enddate><creator>Chatzilygeroudis, Konstantinos</creator><creator>Vassiliades, Vassilis</creator><creator>Stulp, Freek</creator><creator>Calinon, Sylvain</creator><creator>Mouret, Jean-Baptiste</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0001-9555-9517</orcidid><orcidid>https://orcid.org/0000-0002-2513-027X</orcidid><orcidid>https://orcid.org/0000-0003-3585-1027</orcidid><orcidid>https://orcid.org/0000-0002-9036-6799</orcidid><orcidid>https://orcid.org/0000-0002-1336-5629</orcidid></search><sort><creationdate>20200401</creationdate><title>A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials</title><author>Chatzilygeroudis, Konstantinos ; Vassiliades, Vassilis ; Stulp, Freek ; Calinon, Sylvain ; Mouret, Jean-Baptiste</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c480t-87f2160ecab4d7aacda3a6297bd0e1231c0349e7df3b8cd43cf4322d24b328c13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Automatic</topic><topic>Autonomous agents</topic><topic>Biological system modeling</topic><topic>Computer Science</topic><topic>Computer simulation</topic><topic>Computing time</topic><topic>Data models</topic><topic>Dynamic models</topic><topic>Engineering Sciences</topic><topic>Europe</topic><topic>Heuristic algorithms</topic><topic>learning and adaptive systems</topic><topic>Machine learning</topic><topic>micro-data policy search (MDPS)</topic><topic>Optimization</topic><topic>Robot kinematics</topic><topic>robot learning</topic><topic>Robotics</topic><topic>Robots</topic><topic>Search algorithms</topic><topic>Simulators</topic><topic>Trajectory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chatzilygeroudis, Konstantinos</creatorcontrib><creatorcontrib>Vassiliades, Vassilis</creatorcontrib><creatorcontrib>Stulp, Freek</creatorcontrib><creatorcontrib>Calinon, Sylvain</creatorcontrib><creatorcontrib>Mouret, Jean-Baptiste</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Mechanical &amp; Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>IEEE transactions on robotics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chatzilygeroudis, Konstantinos</au><au>Vassiliades, Vassilis</au><au>Stulp, Freek</au><au>Calinon, Sylvain</au><au>Mouret, Jean-Baptiste</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials</atitle><jtitle>IEEE transactions on robotics</jtitle><stitle>TRO</stitle><date>2020-04-01</date><risdate>2020</risdate><volume>36</volume><issue>2</issue><spage>328</spage><epage>347</epage><pages>328-347</pages><issn>1552-3098</issn><eissn>1941-0468</eissn><coden>ITREAE</coden><abstract>Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data," we refer to this challenge as "micro-data reinforcement learning." In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TRO.2019.2958211</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0001-9555-9517</orcidid><orcidid>https://orcid.org/0000-0002-2513-027X</orcidid><orcidid>https://orcid.org/0000-0003-3585-1027</orcidid><orcidid>https://orcid.org/0000-0002-9036-6799</orcidid><orcidid>https://orcid.org/0000-0002-1336-5629</orcidid><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier ISSN: 1552-3098
ispartof IEEE transactions on robotics, 2020-04, Vol.36 (2), p.328-347
issn 1552-3098
1941-0468
language eng
recordid cdi_hal_primary_oai_HAL_hal_02393432v1
source IEEE Electronic Library (IEL)
subjects Algorithms
Artificial Intelligence
Automatic
Autonomous agents
Biological system modeling
Computer Science
Computer simulation
Computing time
Data models
Dynamic models
Engineering Sciences
Europe
Heuristic algorithms
learning and adaptive systems
Machine learning
micro-data policy search (MDPS)
Optimization
Robot kinematics
robot learning
Robotics
Robots
Search algorithms
Simulators
Trajectory
title A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T10%3A58%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Survey%20on%20Policy%20Search%20Algorithms%20for%20Learning%20Robot%20Controllers%20in%20a%20Handful%20of%20Trials&rft.jtitle=IEEE%20transactions%20on%20robotics&rft.au=Chatzilygeroudis,%20Konstantinos&rft.date=2020-04-01&rft.volume=36&rft.issue=2&rft.spage=328&rft.epage=347&rft.pages=328-347&rft.issn=1552-3098&rft.eissn=1941-0468&rft.coden=ITREAE&rft_id=info:doi/10.1109/TRO.2019.2958211&rft_dat=%3Cproquest_hal_p%3E2387070954%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2387070954&rft_id=info:pmid/&rft_ieee_id=8944013&rfr_iscdi=true