A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy

Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipula...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	arXiv.org 2024-11
Hauptverfasser:	Sun, Xiatao, Fan, Francis, Chen, Yinxing, Rakita, Daniel
Format:	Artikel
Sprache:	eng
Schlagworte:	Cameras Comparative studies Configuration management End effectors Euler angles Inverse kinematics Learning Representations Robot learning Robotic surgery Robotics Rotation Solvers Task complexity
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page
container_issue
container_start_page
container_title	arXiv.org
container_volume
creator	Sun, Xiatao Fan, Francis Chen, Yinxing Rakita, Daniel
description	Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipulation. However, it remains unclear which state-action space is most suitable for this complex learning process. To enable manipulation with dynamic viewpoints and to better understand impacts from different state-action spaces on this policy learning process, we conduct a comparative study on the state-action spaces for policy learning and their impacts on the performance of visuomotor policies that integrate viewpoint selection with manipulation. Specifically, we examine the configuration space of the robotic system, the end-effector space with a dual-arm Inverse Kinematics (IK) solver, and the reduced end-effector space with a look-at IK solver to optimize rotation for viewpoint selection. We also assess variants with different rotation representations. Our results demonstrate that state-action spaces utilizing Euler angles with the look-at IK achieve superior task success rates compared to other spaces. Further analysis suggests that these performance differences are driven by inherent variations in the high-frequency components across different state-action spaces and rotation representations.
format	Article
fullrecord	<record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3108868884</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3108868884</sourcerecordid><originalsourceid>FETCH-proquest_journals_31088688843</originalsourceid><addsrcrecordid>eNqNiskKAjEQRIMgKOo_NHgeGBOXXMUFDwqC4lWasUdbxiRmUfx71w_wVPWqXk00pVK9TPelbIhOCOc8z-VwJAcD1RR-DBN7cegx8o1gE9PhAda8CkbKxkXkNzgsKEBpPSwJvWFzhB3T3Vk2ETZU0ddDc4AVGnapws9w53iCKZdlCm9c24qLR1vUS6wCdX7ZEt35bDtZZM7ba6IQ92ebvHlde9XLtR5qrfvqP-sJU5VLmQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3108868884</pqid></control><display><type>article</type><title>A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy</title><source>Free E- Journals</source><creator>Sun, Xiatao ; Fan, Francis ; Chen, Yinxing ; Rakita, Daniel</creator><creatorcontrib>Sun, Xiatao ; Fan, Francis ; Chen, Yinxing ; Rakita, Daniel</creatorcontrib><description>Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipulation. However, it remains unclear which state-action space is most suitable for this complex learning process. To enable manipulation with dynamic viewpoints and to better understand impacts from different state-action spaces on this policy learning process, we conduct a comparative study on the state-action spaces for policy learning and their impacts on the performance of visuomotor policies that integrate viewpoint selection with manipulation. Specifically, we examine the configuration space of the robotic system, the end-effector space with a dual-arm Inverse Kinematics (IK) solver, and the reduced end-effector space with a look-at IK solver to optimize rotation for viewpoint selection. We also assess variants with different rotation representations. Our results demonstrate that state-action spaces utilizing Euler angles with the look-at IK achieve superior task success rates compared to other spaces. Further analysis suggests that these performance differences are driven by inherent variations in the high-frequency components across different state-action spaces and rotation representations.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Cameras ; Comparative studies ; Configuration management ; End effectors ; Euler angles ; Inverse kinematics ; Learning ; Representations ; Robot learning ; Robotic surgery ; Robotics ; Rotation ; Solvers ; Task complexity</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Sun, Xiatao</creatorcontrib><creatorcontrib>Fan, Francis</creatorcontrib><creatorcontrib>Chen, Yinxing</creatorcontrib><creatorcontrib>Rakita, Daniel</creatorcontrib><title>A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy</title><title>arXiv.org</title><description>Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipulation. However, it remains unclear which state-action space is most suitable for this complex learning process. To enable manipulation with dynamic viewpoints and to better understand impacts from different state-action spaces on this policy learning process, we conduct a comparative study on the state-action spaces for policy learning and their impacts on the performance of visuomotor policies that integrate viewpoint selection with manipulation. Specifically, we examine the configuration space of the robotic system, the end-effector space with a dual-arm Inverse Kinematics (IK) solver, and the reduced end-effector space with a look-at IK solver to optimize rotation for viewpoint selection. We also assess variants with different rotation representations. Our results demonstrate that state-action spaces utilizing Euler angles with the look-at IK achieve superior task success rates compared to other spaces. Further analysis suggests that these performance differences are driven by inherent variations in the high-frequency components across different state-action spaces and rotation representations.</description><subject>Cameras</subject><subject>Comparative studies</subject><subject>Configuration management</subject><subject>End effectors</subject><subject>Euler angles</subject><subject>Inverse kinematics</subject><subject>Learning</subject><subject>Representations</subject><subject>Robot learning</subject><subject>Robotic surgery</subject><subject>Robotics</subject><subject>Rotation</subject><subject>Solvers</subject><subject>Task complexity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNiskKAjEQRIMgKOo_NHgeGBOXXMUFDwqC4lWasUdbxiRmUfx71w_wVPWqXk00pVK9TPelbIhOCOc8z-VwJAcD1RR-DBN7cegx8o1gE9PhAda8CkbKxkXkNzgsKEBpPSwJvWFzhB3T3Vk2ETZU0ddDc4AVGnapws9w53iCKZdlCm9c24qLR1vUS6wCdX7ZEt35bDtZZM7ba6IQ92ebvHlde9XLtR5qrfvqP-sJU5VLmQ</recordid><startdate>20241113</startdate><enddate>20241113</enddate><creator>Sun, Xiatao</creator><creator>Fan, Francis</creator><creator>Chen, Yinxing</creator><creator>Rakita, Daniel</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241113</creationdate><title>A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy</title><author>Sun, Xiatao ; Fan, Francis ; Chen, Yinxing ; Rakita, Daniel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31088688843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Cameras</topic><topic>Comparative studies</topic><topic>Configuration management</topic><topic>End effectors</topic><topic>Euler angles</topic><topic>Inverse kinematics</topic><topic>Learning</topic><topic>Representations</topic><topic>Robot learning</topic><topic>Robotic surgery</topic><topic>Robotics</topic><topic>Rotation</topic><topic>Solvers</topic><topic>Task complexity</topic><toplevel>online_resources</toplevel><creatorcontrib>Sun, Xiatao</creatorcontrib><creatorcontrib>Fan, Francis</creatorcontrib><creatorcontrib>Chen, Yinxing</creatorcontrib><creatorcontrib>Rakita, Daniel</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Xiatao</au><au>Fan, Francis</au><au>Chen, Yinxing</au><au>Rakita, Daniel</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy</atitle><jtitle>arXiv.org</jtitle><date>2024-11-13</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipulation. However, it remains unclear which state-action space is most suitable for this complex learning process. To enable manipulation with dynamic viewpoints and to better understand impacts from different state-action spaces on this policy learning process, we conduct a comparative study on the state-action spaces for policy learning and their impacts on the performance of visuomotor policies that integrate viewpoint selection with manipulation. Specifically, we examine the configuration space of the robotic system, the end-effector space with a dual-arm Inverse Kinematics (IK) solver, and the reduced end-effector space with a look-at IK solver to optimize rotation for viewpoint selection. We also assess variants with different rotation representations. Our results demonstrate that state-action spaces utilizing Euler angles with the look-at IK achieve superior task success rates compared to other spaces. Further analysis suggests that these performance differences are driven by inherent variations in the high-frequency components across different state-action spaces and rotation representations.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext	fulltext
identifier	EISSN: 2331-8422
ispartof	arXiv.org, 2024-11
issn	2331-8422
language	eng
recordid	cdi_proquest_journals_3108868884
source	Free E- Journals
subjects	Cameras Comparative studies Configuration management End effectors Euler angles Inverse kinematics Learning Representations Robot learning Robotic surgery Robotics Rotation Solvers Task complexity
title	A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T12%3A09%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=A%20Comparative%20Study%20on%20State-Action%20Spaces%20for%20Learning%20Viewpoint%20Selection%20and%20Manipulation%20with%20Diffusion%20Policy&rft.jtitle=arXiv.org&rft.au=Sun,%20Xiatao&rft.date=2024-11-13&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3108868884%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3108868884&rft_id=info:pmid/&rfr_iscdi=true