Sim-to-Real Transfer for Biped Locomotion

We present a new approach for transfer of dynamic robot control policies such as biped locomotion from simulation to real hardware. Key to our approach is to perform system identification of the model parameters {\mu} of the hardware (e.g. friction, center-of-mass) in two distinct stages, before pol...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2019-08
Hauptverfasser:	Yu, Wenhao, Visak CV Kumar, Turk, Greg, Liu, C Karen
Format:	Artikel
Sprache:	eng
Schlagworte:	Bayesian analysis Computer simulation Conditioning Controllers Hardware Learning Locomotion Parameter identification Policies Robot control Robot dynamics Robots System identification Trajectory optimization
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Yu, Wenhao Visak CV Kumar Turk, Greg Liu, C Karen
description	We present a new approach for transfer of dynamic robot control policies such as biped locomotion from simulation to real hardware. Key to our approach is to perform system identification of the model parameters {\mu} of the hardware (e.g. friction, center-of-mass) in two distinct stages, before policy learning (pre-sysID) and after policy learning (post-sysID). Pre-sysID begins by collecting trajectories from the physical hardware based on a set of generic motion sequences. Because the trajectories may not be related to the task of interest, presysID does not attempt to accurately identify the true value of {\mu}, but only to approximate the range of {\mu} to guide the policy learning. Next, a Projected Universal Policy (PUP) is created by simultaneously training a network that projects {\mu} to a low-dimensional latent variable {\eta} and a family of policies that are conditioned on {\eta}. The second round of system identification (post-sysID) is then carried out by deploying the PUP on the robot hardware using task-relevant trajectories. We use Bayesian Optimization to determine the values for {\eta} that optimizes the performance of PUP on the real hardware. We have used this approach to create three successful biped locomotion controllers (walk forward, walk backwards, walk sideways) on the Darwin OP2 robot.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_2188080554</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2188080554</sourcerecordid><originalsourceid>FETCH-proquest_journals_21880805543</originalsourceid><addsrcrecordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQDM7M1S3J1w1KTcxRCClKzCtOSy1SSMsvUnDKLEhNUfDJT87PzS_JzM_jYWBNS8wpTuWF0twMym6uIc4eugVF-YWlqcUl8Vn5pUV5QKl4I0MLCwMLA1NTE2PiVAEAnhQv_Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2188080554</pqid></control><display><type>article</type><title>Sim-to-Real Transfer for Biped Locomotion</title><source>Free E- Journals</source><creator>Yu, Wenhao ; Visak CV Kumar ; Turk, Greg ; Liu, C Karen</creator><creatorcontrib>Yu, Wenhao ; Visak CV Kumar ; Turk, Greg ; Liu, C Karen</creatorcontrib><description>We present a new approach for transfer of dynamic robot control policies such as biped locomotion from simulation to real hardware. Key to our approach is to perform system identification of the model parameters {\mu} of the hardware (e.g. friction, center-of-mass) in two distinct stages, before policy learning (pre-sysID) and after policy learning (post-sysID). Pre-sysID begins by collecting trajectories from the physical hardware based on a set of generic motion sequences. Because the trajectories may not be related to the task of interest, presysID does not attempt to accurately identify the true value of {\mu}, but only to approximate the range of {\mu} to guide the policy learning. Next, a Projected Universal Policy (PUP) is created by simultaneously training a network that projects {\mu} to a low-dimensional latent variable {\eta} and a family of policies that are conditioned on {\eta}. The second round of system identification (post-sysID) is then carried out by deploying the PUP on the robot hardware using task-relevant trajectories. We use Bayesian Optimization to determine the values for {\eta} that optimizes the performance of PUP on the real hardware. We have used this approach to create three successful biped locomotion controllers (walk forward, walk backwards, walk sideways) on the Darwin OP2 robot.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Bayesian analysis ; Computer simulation ; Conditioning ; Controllers ; Hardware ; Learning ; Locomotion ; Parameter identification ; Policies ; Robot control ; Robot dynamics ; Robots ; System identification ; Trajectory optimization</subject><ispartof>arXiv.org, 2019-08</ispartof><rights>2019. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Yu, Wenhao</creatorcontrib><creatorcontrib>Visak CV Kumar</creatorcontrib><creatorcontrib>Turk, Greg</creatorcontrib><creatorcontrib>Liu, C Karen</creatorcontrib><title>Sim-to-Real Transfer for Biped Locomotion</title><title>arXiv.org</title><description>We present a new approach for transfer of dynamic robot control policies such as biped locomotion from simulation to real hardware. Key to our approach is to perform system identification of the model parameters {\mu} of the hardware (e.g. friction, center-of-mass) in two distinct stages, before policy learning (pre-sysID) and after policy learning (post-sysID). Pre-sysID begins by collecting trajectories from the physical hardware based on a set of generic motion sequences. Because the trajectories may not be related to the task of interest, presysID does not attempt to accurately identify the true value of {\mu}, but only to approximate the range of {\mu} to guide the policy learning. Next, a Projected Universal Policy (PUP) is created by simultaneously training a network that projects {\mu} to a low-dimensional latent variable {\eta} and a family of policies that are conditioned on {\eta}. The second round of system identification (post-sysID) is then carried out by deploying the PUP on the robot hardware using task-relevant trajectories. We use Bayesian Optimization to determine the values for {\eta} that optimizes the performance of PUP on the real hardware. We have used this approach to create three successful biped locomotion controllers (walk forward, walk backwards, walk sideways) on the Darwin OP2 robot.</description><subject>Bayesian analysis</subject><subject>Computer simulation</subject><subject>Conditioning</subject><subject>Controllers</subject><subject>Hardware</subject><subject>Learning</subject><subject>Locomotion</subject><subject>Parameter identification</subject><subject>Policies</subject><subject>Robot control</subject><subject>Robot dynamics</subject><subject>Robots</subject><subject>System identification</subject><subject>Trajectory optimization</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2019</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNpjYuA0MjY21LUwMTLiYOAtLs4yMDAwMjM3MjU15mTQDM7M1S3J1w1KTcxRCClKzCtOSy1SSMsvUnDKLEhNUfDJT87PzS_JzM_jYWBNS8wpTuWF0twMym6uIc4eugVF-YWlqcUl8Vn5pUV5QKl4I0MLCwMLA1NTE2PiVAEAnhQv_Q</recordid><startdate>20190825</startdate><enddate>20190825</enddate><creator>Yu, Wenhao</creator><creator>Visak CV Kumar</creator><creator>Turk, Greg</creator><creator>Liu, C Karen</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20190825</creationdate><title>Sim-to-Real Transfer for Biped Locomotion</title><author>Yu, Wenhao ; Visak CV Kumar ; Turk, Greg ; Liu, C Karen</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_21880805543</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2019</creationdate><topic>Bayesian analysis</topic><topic>Computer simulation</topic><topic>Conditioning</topic><topic>Controllers</topic><topic>Hardware</topic><topic>Learning</topic><topic>Locomotion</topic><topic>Parameter identification</topic><topic>Policies</topic><topic>Robot control</topic><topic>Robot dynamics</topic><topic>Robots</topic><topic>System identification</topic><topic>Trajectory optimization</topic><toplevel>online_resources</toplevel><creatorcontrib>Yu, Wenhao</creatorcontrib><creatorcontrib>Visak CV Kumar</creatorcontrib><creatorcontrib>Turk, Greg</creatorcontrib><creatorcontrib>Liu, C Karen</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Yu, Wenhao</au><au>Visak CV Kumar</au><au>Turk, Greg</au><au>Liu, C Karen</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>Sim-to-Real Transfer for Biped Locomotion</atitle><jtitle>arXiv.org</jtitle><date>2019-08-25</date><risdate>2019</risdate><eissn>2331-8422</eissn><abstract>We present a new approach for transfer of dynamic robot control policies such as biped locomotion from simulation to real hardware. Key to our approach is to perform system identification of the model parameters {\mu} of the hardware (e.g. friction, center-of-mass) in two distinct stages, before policy learning (pre-sysID) and after policy learning (post-sysID). Pre-sysID begins by collecting trajectories from the physical hardware based on a set of generic motion sequences. Because the trajectories may not be related to the task of interest, presysID does not attempt to accurately identify the true value of {\mu}, but only to approximate the range of {\mu} to guide the policy learning. Next, a Projected Universal Policy (PUP) is created by simultaneously training a network that projects {\mu} to a low-dimensional latent variable {\eta} and a family of policies that are conditioned on {\eta}. The second round of system identification (post-sysID) is then carried out by deploying the PUP on the robot hardware using task-relevant trajectories. We use Bayesian Optimization to determine the values for {\eta} that optimizes the performance of PUP on the real hardware. We have used this approach to create three successful biped locomotion controllers (walk forward, walk backwards, walk sideways) on the Darwin OP2 robot.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2019-08
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_2188080554
source	Free E- Journals
subjects	Bayesian analysis Computer simulation Conditioning Controllers Hardware Learning Locomotion Parameter identification Policies Robot control Robot dynamics Robots System identification Trajectory optimization
title	Sim-to-Real Transfer for Biped Locomotion
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-28T19%3A13%3A37IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=Sim-to-Real%20Transfer%20for%20Biped%20Locomotion&rft.jtitle=arXiv.org&rft.au=Yu,%20Wenhao&rft.date=2019-08-25&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E2188080554%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2188080554&rft_id=info:pmid/&rfr_iscdi=true