Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control

Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have large...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Xiong, Zheng, Vuorio, Risto, Beck, Jacob, Zimmer, Matthieu, Shao, Kun, Whiteson, Shimon
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Learning Computer Science - Robotics
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title
container_volume
creator	Xiong, Zheng Vuorio, Risto Beck, Jacob Zimmer, Matthieu Shao, Kun Whiteson, Shimon
description	Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.
doi_str_mv	10.48550/arxiv.2402.06570
format	Article
fullrecord	<record><control><sourceid>arxiv_GOX</sourceid><recordid>TN_cdi_arxiv_primary_2402_06570</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>2402_06570</sourcerecordid><originalsourceid>FETCH-LOGICAL-a670-4f23185504f6faac5a95a9f8a56bb6b545a1d0332ecc996dbf4934abf1ab997e3</originalsourceid><addsrcrecordid>eNpNj71OwzAUhb0woMIDMOEXSHDin9QjCoUitWIpC0t0ndjlCmNHjlXI29MWBqQjnek7Oh8hNxUrxVJKdgfpGw9lLVhdMiUbdkneHnDK6D2GPd3GNL5HH_dz0cYwYMYY7EDX82hTsPkrpo-Jupjoyjns0YZMXwMebJrA_4PpEc4p-ity4cBP9vqvF2T3uNq162Lz8vTc3m8KUA0rhKt5dTonnHIAvQR9jFuCVMYoI4WEamCc17bvtVaDcUJzAcZVYLRuLF-Q29_Zs1w3JvyENHcnye4syX8A611PTQ</addsrcrecordid><sourcetype>Open Access Repository</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control</title><source>arXiv.org</source><creator>Xiong, Zheng ; Vuorio, Risto ; Beck, Jacob ; Zimmer, Matthieu ; Shao, Kun ; Whiteson, Shimon</creator><creatorcontrib>Xiong, Zheng ; Vuorio, Risto ; Beck, Jacob ; Zimmer, Matthieu ; Shao, Kun ; Whiteson, Shimon</creatorcontrib><description>Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.</description><identifier>DOI: 10.48550/arxiv.2402.06570</identifier><language>eng</language><subject>Computer Science - Learning ; Computer Science - Robotics</subject><creationdate>2024-02</creationdate><rights>http://creativecommons.org/licenses/by/4.0</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>228,230,780,885</link.rule.ids><linktorsrc>$$Uhttps://arxiv.org/abs/2402.06570$$EView_record_in_Cornell_University$$FView_record_in_$$GCornell_University$$Hfree_for_read</linktorsrc><backlink>$$Uhttps://doi.org/10.48550/arXiv.2402.06570$$DView paper in arXiv$$Hfree_for_read</backlink></links><search><creatorcontrib>Xiong, Zheng</creatorcontrib><creatorcontrib>Vuorio, Risto</creatorcontrib><creatorcontrib>Beck, Jacob</creatorcontrib><creatorcontrib>Zimmer, Matthieu</creatorcontrib><creatorcontrib>Shao, Kun</creatorcontrib><creatorcontrib>Whiteson, Shimon</creatorcontrib><title>Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control</title><description>Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.</description><subject>Computer Science - Learning</subject><subject>Computer Science - Robotics</subject><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>GOX</sourceid><recordid>eNpNj71OwzAUhb0woMIDMOEXSHDin9QjCoUitWIpC0t0ndjlCmNHjlXI29MWBqQjnek7Oh8hNxUrxVJKdgfpGw9lLVhdMiUbdkneHnDK6D2GPd3GNL5HH_dz0cYwYMYY7EDX82hTsPkrpo-Jupjoyjns0YZMXwMebJrA_4PpEc4p-ity4cBP9vqvF2T3uNq162Lz8vTc3m8KUA0rhKt5dTonnHIAvQR9jFuCVMYoI4WEamCc17bvtVaDcUJzAcZVYLRuLF-Q29_Zs1w3JvyENHcnye4syX8A611PTQ</recordid><startdate>20240209</startdate><enddate>20240209</enddate><creator>Xiong, Zheng</creator><creator>Vuorio, Risto</creator><creator>Beck, Jacob</creator><creator>Zimmer, Matthieu</creator><creator>Shao, Kun</creator><creator>Whiteson, Shimon</creator><scope>AKY</scope><scope>GOX</scope></search><sort><creationdate>20240209</creationdate><title>Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control</title><author>Xiong, Zheng ; Vuorio, Risto ; Beck, Jacob ; Zimmer, Matthieu ; Shao, Kun ; Whiteson, Shimon</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a670-4f23185504f6faac5a95a9f8a56bb6b545a1d0332ecc996dbf4934abf1ab997e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computer Science - Learning</topic><topic>Computer Science - Robotics</topic><toplevel>online_resources</toplevel><creatorcontrib>Xiong, Zheng</creatorcontrib><creatorcontrib>Vuorio, Risto</creatorcontrib><creatorcontrib>Beck, Jacob</creatorcontrib><creatorcontrib>Zimmer, Matthieu</creatorcontrib><creatorcontrib>Shao, Kun</creatorcontrib><creatorcontrib>Whiteson, Shimon</creatorcontrib><collection>arXiv Computer Science</collection><collection>arXiv.org</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Xiong, Zheng</au><au>Vuorio, Risto</au><au>Beck, Jacob</au><au>Zimmer, Matthieu</au><au>Shao, Kun</au><au>Whiteson, Shimon</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control</atitle><date>2024-02-09</date><risdate>2024</risdate><abstract>Learning a universal policy across different robot morphologies can significantly improve learning efficiency and enable zero-shot generalization to unseen morphologies. However, learning a highly performant universal policy requires sophisticated architectures like transformers (TF) that have larger memory and computational cost than simpler multi-layer perceptrons (MLP). To achieve both good performance like TF and high efficiency like MLP at inference time, we propose HyperDistill, which consists of: (1) A morphology-conditioned hypernetwork (HN) that generates robot-wise MLP policies, and (2) A policy distillation approach that is essential for successful training. We show that on UNIMAL, a benchmark with hundreds of diverse morphologies, HyperDistill performs as well as a universal TF teacher policy on both training and unseen test robots, but reduces model size by 6-14 times, and computational cost by 67-160 times in different environments. Our analysis attributes the efficiency advantage of HyperDistill at inference time to knowledge decoupling, i.e., the ability to decouple inter-task and intra-task knowledge, a general principle that could also be applied to improve inference efficiency in other domains.</abstract><doi>10.48550/arxiv.2402.06570</doi><oa>free_for_read</oa></addata></record>
fulltext	fulltext_linktorsrc
identifier	DOI: 10.48550/arxiv.2402.06570
ispartof
issn
language	eng
recordid	cdi_arxiv_primary_2402_06570
source	arXiv.org
subjects	Computer Science - Learning Computer Science - Robotics
title	Distilling Morphology-Conditioned Hypernetworks for Efficient Universal Morphology Control
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-31T00%3A10%3A50IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-arxiv_GOX&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Distilling%20Morphology-Conditioned%20Hypernetworks%20for%20Efficient%20Universal%20Morphology%20Control&rft.au=Xiong,%20Zheng&rft.date=2024-02-09&rft_id=info:doi/10.48550/arxiv.2402.06570&rft_dat=%3Carxiv_GOX%3E2402_06570%3C/arxiv_GOX%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true