A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials
Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minute...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on robotics 2020-04, Vol.36 (2), p.328-347 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 347 |
---|---|
container_issue | 2 |
container_start_page | 328 |
container_title | IEEE transactions on robotics |
container_volume | 36 |
creator | Chatzilygeroudis, Konstantinos Vassiliades, Vassilis Stulp, Freek Calinon, Sylvain Mouret, Jean-Baptiste |
description | Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data," we refer to this challenge as "micro-data reinforcement learning." In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time. |
doi_str_mv | 10.1109/TRO.2019.2958211 |
format | Article |
fullrecord | <record><control><sourceid>proquest_hal_p</sourceid><recordid>TN_cdi_hal_primary_oai_HAL_hal_02393432v1</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>8944013</ieee_id><sourcerecordid>2387070954</sourcerecordid><originalsourceid>FETCH-LOGICAL-c480t-87f2160ecab4d7aacda3a6297bd0e1231c0349e7df3b8cd43cf4322d24b328c13</originalsourceid><addsrcrecordid>eNo9kM9LwzAcxYsoOKd3wUvAk4fOb350TY5lqBMGk21evIQ0TbaOrtGkHey_t6Vjp_fl8XmPLy-KHjFMMAbxulktJwSwmBCRcILxVTTCguEY2JRfd3eSkJiC4LfRXQh7AMIE0FH0k6F164_mhFyNvlxV6hNaG-X1DmXV1vmy2R0Css6jRefWZb1FK5e7Bs1c3XhXVcYHVNZIobmqC9tWyFm08aWqwn10YzsxD2cdR9_vb5vZPF4sPz5n2SLWjEMT89QSPAWjVc6KVCldKKqmRKR5AQYTijVQJkxaWJpzXTCqLaOEFITllHCN6Th6GXp3qpK_vjwof5JOlXKeLWTvAaGCdpljzz4P7K93f60Jjdy71tfde5JQnkIKImEdBQOlvQvBG3upxSD7tWW3tuzXlue1u8jTECmNMRecC8YAU_oPtWV5yA</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2387070954</pqid></control><display><type>article</type><title>A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials</title><source>IEEE Electronic Library (IEL)</source><creator>Chatzilygeroudis, Konstantinos ; Vassiliades, Vassilis ; Stulp, Freek ; Calinon, Sylvain ; Mouret, Jean-Baptiste</creator><creatorcontrib>Chatzilygeroudis, Konstantinos ; Vassiliades, Vassilis ; Stulp, Freek ; Calinon, Sylvain ; Mouret, Jean-Baptiste</creatorcontrib><description>Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data," we refer to this challenge as "micro-data reinforcement learning." In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.</description><identifier>ISSN: 1552-3098</identifier><identifier>EISSN: 1941-0468</identifier><identifier>DOI: 10.1109/TRO.2019.2958211</identifier><identifier>CODEN: ITREAE</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>Algorithms ; Artificial Intelligence ; Automatic ; Autonomous agents ; Biological system modeling ; Computer Science ; Computer simulation ; Computing time ; Data models ; Dynamic models ; Engineering Sciences ; Europe ; Heuristic algorithms ; learning and adaptive systems ; Machine learning ; micro-data policy search (MDPS) ; Optimization ; Robot kinematics ; robot learning ; Robotics ; Robots ; Search algorithms ; Simulators ; Trajectory</subject><ispartof>IEEE transactions on robotics, 2020-04, Vol.36 (2), p.328-347</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2020</rights><rights>Distributed under a Creative Commons Attribution 4.0 International License</rights><lds50>peer_reviewed</lds50><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed><citedby>FETCH-LOGICAL-c480t-87f2160ecab4d7aacda3a6297bd0e1231c0349e7df3b8cd43cf4322d24b328c13</citedby><cites>FETCH-LOGICAL-c480t-87f2160ecab4d7aacda3a6297bd0e1231c0349e7df3b8cd43cf4322d24b328c13</cites><orcidid>0000-0001-9555-9517 ; 0000-0002-2513-027X ; 0000-0003-3585-1027 ; 0000-0002-9036-6799 ; 0000-0002-1336-5629</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/8944013$$EHTML$$P50$$Gieee$$Hfree_for_read</linktohtml><link.rule.ids>230,314,776,780,792,881,27901,27902,54733</link.rule.ids><backlink>$$Uhttps://inria.hal.science/hal-02393432$$DView record in HAL$$Hfree_for_read</backlink></links><search><creatorcontrib>Chatzilygeroudis, Konstantinos</creatorcontrib><creatorcontrib>Vassiliades, Vassilis</creatorcontrib><creatorcontrib>Stulp, Freek</creatorcontrib><creatorcontrib>Calinon, Sylvain</creatorcontrib><creatorcontrib>Mouret, Jean-Baptiste</creatorcontrib><title>A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials</title><title>IEEE transactions on robotics</title><addtitle>TRO</addtitle><description>Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data," we refer to this challenge as "micro-data reinforcement learning." In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.</description><subject>Algorithms</subject><subject>Artificial Intelligence</subject><subject>Automatic</subject><subject>Autonomous agents</subject><subject>Biological system modeling</subject><subject>Computer Science</subject><subject>Computer simulation</subject><subject>Computing time</subject><subject>Data models</subject><subject>Dynamic models</subject><subject>Engineering Sciences</subject><subject>Europe</subject><subject>Heuristic algorithms</subject><subject>learning and adaptive systems</subject><subject>Machine learning</subject><subject>micro-data policy search (MDPS)</subject><subject>Optimization</subject><subject>Robot kinematics</subject><subject>robot learning</subject><subject>Robotics</subject><subject>Robots</subject><subject>Search algorithms</subject><subject>Simulators</subject><subject>Trajectory</subject><issn>1552-3098</issn><issn>1941-0468</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2020</creationdate><recordtype>article</recordtype><sourceid>ESBDL</sourceid><sourceid>RIE</sourceid><recordid>eNo9kM9LwzAcxYsoOKd3wUvAk4fOb350TY5lqBMGk21evIQ0TbaOrtGkHey_t6Vjp_fl8XmPLy-KHjFMMAbxulktJwSwmBCRcILxVTTCguEY2JRfd3eSkJiC4LfRXQh7AMIE0FH0k6F164_mhFyNvlxV6hNaG-X1DmXV1vmy2R0Css6jRefWZb1FK5e7Bs1c3XhXVcYHVNZIobmqC9tWyFm08aWqwn10YzsxD2cdR9_vb5vZPF4sPz5n2SLWjEMT89QSPAWjVc6KVCldKKqmRKR5AQYTijVQJkxaWJpzXTCqLaOEFITllHCN6Th6GXp3qpK_vjwof5JOlXKeLWTvAaGCdpljzz4P7K93f60Jjdy71tfde5JQnkIKImEdBQOlvQvBG3upxSD7tWW3tuzXlue1u8jTECmNMRecC8YAU_oPtWV5yA</recordid><startdate>20200401</startdate><enddate>20200401</enddate><creator>Chatzilygeroudis, Konstantinos</creator><creator>Vassiliades, Vassilis</creator><creator>Stulp, Freek</creator><creator>Calinon, Sylvain</creator><creator>Mouret, Jean-Baptiste</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>ESBDL</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>7TB</scope><scope>8FD</scope><scope>FR3</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><scope>1XC</scope><scope>VOOES</scope><orcidid>https://orcid.org/0000-0001-9555-9517</orcidid><orcidid>https://orcid.org/0000-0002-2513-027X</orcidid><orcidid>https://orcid.org/0000-0003-3585-1027</orcidid><orcidid>https://orcid.org/0000-0002-9036-6799</orcidid><orcidid>https://orcid.org/0000-0002-1336-5629</orcidid></search><sort><creationdate>20200401</creationdate><title>A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials</title><author>Chatzilygeroudis, Konstantinos ; Vassiliades, Vassilis ; Stulp, Freek ; Calinon, Sylvain ; Mouret, Jean-Baptiste</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c480t-87f2160ecab4d7aacda3a6297bd0e1231c0349e7df3b8cd43cf4322d24b328c13</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2020</creationdate><topic>Algorithms</topic><topic>Artificial Intelligence</topic><topic>Automatic</topic><topic>Autonomous agents</topic><topic>Biological system modeling</topic><topic>Computer Science</topic><topic>Computer simulation</topic><topic>Computing time</topic><topic>Data models</topic><topic>Dynamic models</topic><topic>Engineering Sciences</topic><topic>Europe</topic><topic>Heuristic algorithms</topic><topic>learning and adaptive systems</topic><topic>Machine learning</topic><topic>micro-data policy search (MDPS)</topic><topic>Optimization</topic><topic>Robot kinematics</topic><topic>robot learning</topic><topic>Robotics</topic><topic>Robots</topic><topic>Search algorithms</topic><topic>Simulators</topic><topic>Trajectory</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chatzilygeroudis, Konstantinos</creatorcontrib><creatorcontrib>Vassiliades, Vassilis</creatorcontrib><creatorcontrib>Stulp, Freek</creatorcontrib><creatorcontrib>Calinon, Sylvain</creatorcontrib><creatorcontrib>Mouret, Jean-Baptiste</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE Open Access Journals</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Mechanical & Transportation Engineering Abstracts</collection><collection>Technology Research Database</collection><collection>Engineering Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><collection>Hyper Article en Ligne (HAL)</collection><collection>Hyper Article en Ligne (HAL) (Open Access)</collection><jtitle>IEEE transactions on robotics</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chatzilygeroudis, Konstantinos</au><au>Vassiliades, Vassilis</au><au>Stulp, Freek</au><au>Calinon, Sylvain</au><au>Mouret, Jean-Baptiste</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials</atitle><jtitle>IEEE transactions on robotics</jtitle><stitle>TRO</stitle><date>2020-04-01</date><risdate>2020</risdate><volume>36</volume><issue>2</issue><spage>328</spage><epage>347</epage><pages>328-347</pages><issn>1552-3098</issn><eissn>1941-0468</eissn><coden>ITREAE</coden><abstract>Most policy search (PS) algorithms require thousands of training episodes to find an effective policy, which is often infeasible with a physical robot. This survey article focuses on the extreme other end of the spectrum: how can a robot adapt with only a handful of trials (a dozen) and a few minutes? By analogy with the word "big-data," we refer to this challenge as "micro-data reinforcement learning." In this article, we show that a first strategy is to leverage prior knowledge on the policy structure (e.g., dynamic movement primitives), on the policy parameters (e.g., demonstrations), or on the dynamics (e.g., simulators). A second strategy is to create data-driven surrogate models of the expected reward (e.g., Bayesian optimization) or the dynamical model (e.g., model-based PS), so that the policy optimizer queries the model instead of the real system. Overall, all successful micro-data algorithms combine these two strategies by varying the kind of model and prior knowledge. The current scientific challenges essentially revolve around scaling up to complex robots, designing generic priors, and optimizing the computing time.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TRO.2019.2958211</doi><tpages>20</tpages><orcidid>https://orcid.org/0000-0001-9555-9517</orcidid><orcidid>https://orcid.org/0000-0002-2513-027X</orcidid><orcidid>https://orcid.org/0000-0003-3585-1027</orcidid><orcidid>https://orcid.org/0000-0002-9036-6799</orcidid><orcidid>https://orcid.org/0000-0002-1336-5629</orcidid><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | ISSN: 1552-3098 |
ispartof | IEEE transactions on robotics, 2020-04, Vol.36 (2), p.328-347 |
issn | 1552-3098 1941-0468 |
language | eng |
recordid | cdi_hal_primary_oai_HAL_hal_02393432v1 |
source | IEEE Electronic Library (IEL) |
subjects | Algorithms Artificial Intelligence Automatic Autonomous agents Biological system modeling Computer Science Computer simulation Computing time Data models Dynamic models Engineering Sciences Europe Heuristic algorithms learning and adaptive systems Machine learning micro-data policy search (MDPS) Optimization Robot kinematics robot learning Robotics Robots Search algorithms Simulators Trajectory |
title | A Survey on Policy Search Algorithms for Learning Robot Controllers in a Handful of Trials |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-13T10%3A58%3A06IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_hal_p&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=A%20Survey%20on%20Policy%20Search%20Algorithms%20for%20Learning%20Robot%20Controllers%20in%20a%20Handful%20of%20Trials&rft.jtitle=IEEE%20transactions%20on%20robotics&rft.au=Chatzilygeroudis,%20Konstantinos&rft.date=2020-04-01&rft.volume=36&rft.issue=2&rft.spage=328&rft.epage=347&rft.pages=328-347&rft.issn=1552-3098&rft.eissn=1941-0468&rft.coden=ITREAE&rft_id=info:doi/10.1109/TRO.2019.2958211&rft_dat=%3Cproquest_hal_p%3E2387070954%3C/proquest_hal_p%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2387070954&rft_id=info:pmid/&rft_ieee_id=8944013&rfr_iscdi=true |