A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy
Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipula...
Gespeichert in:
Veröffentlicht in: | arXiv.org 2024-11 |
---|---|
Hauptverfasser: | , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | |
---|---|
container_issue | |
container_start_page | |
container_title | arXiv.org |
container_volume | |
creator | Sun, Xiatao Fan, Francis Chen, Yinxing Rakita, Daniel |
description | Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipulation. However, it remains unclear which state-action space is most suitable for this complex learning process. To enable manipulation with dynamic viewpoints and to better understand impacts from different state-action spaces on this policy learning process, we conduct a comparative study on the state-action spaces for policy learning and their impacts on the performance of visuomotor policies that integrate viewpoint selection with manipulation. Specifically, we examine the configuration space of the robotic system, the end-effector space with a dual-arm Inverse Kinematics (IK) solver, and the reduced end-effector space with a look-at IK solver to optimize rotation for viewpoint selection. We also assess variants with different rotation representations. Our results demonstrate that state-action spaces utilizing Euler angles with the look-at IK achieve superior task success rates compared to other spaces. Further analysis suggests that these performance differences are driven by inherent variations in the high-frequency components across different state-action spaces and rotation representations. |
format | Article |
fullrecord | <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3108868884</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3108868884</sourcerecordid><originalsourceid>FETCH-proquest_journals_31088688843</originalsourceid><addsrcrecordid>eNqNiskKAjEQRIMgKOo_NHgeGBOXXMUFDwqC4lWasUdbxiRmUfx71w_wVPWqXk00pVK9TPelbIhOCOc8z-VwJAcD1RR-DBN7cegx8o1gE9PhAda8CkbKxkXkNzgsKEBpPSwJvWFzhB3T3Vk2ETZU0ddDc4AVGnapws9w53iCKZdlCm9c24qLR1vUS6wCdX7ZEt35bDtZZM7ba6IQ92ebvHlde9XLtR5qrfvqP-sJU5VLmQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3108868884</pqid></control><display><type>article</type><title>A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy</title><source>Free E- Journals</source><creator>Sun, Xiatao ; Fan, Francis ; Chen, Yinxing ; Rakita, Daniel</creator><creatorcontrib>Sun, Xiatao ; Fan, Francis ; Chen, Yinxing ; Rakita, Daniel</creatorcontrib><description>Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipulation. However, it remains unclear which state-action space is most suitable for this complex learning process. To enable manipulation with dynamic viewpoints and to better understand impacts from different state-action spaces on this policy learning process, we conduct a comparative study on the state-action spaces for policy learning and their impacts on the performance of visuomotor policies that integrate viewpoint selection with manipulation. Specifically, we examine the configuration space of the robotic system, the end-effector space with a dual-arm Inverse Kinematics (IK) solver, and the reduced end-effector space with a look-at IK solver to optimize rotation for viewpoint selection. We also assess variants with different rotation representations. Our results demonstrate that state-action spaces utilizing Euler angles with the look-at IK achieve superior task success rates compared to other spaces. Further analysis suggests that these performance differences are driven by inherent variations in the high-frequency components across different state-action spaces and rotation representations.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Cameras ; Comparative studies ; Configuration management ; End effectors ; Euler angles ; Inverse kinematics ; Learning ; Representations ; Robot learning ; Robotic surgery ; Robotics ; Rotation ; Solvers ; Task complexity</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Sun, Xiatao</creatorcontrib><creatorcontrib>Fan, Francis</creatorcontrib><creatorcontrib>Chen, Yinxing</creatorcontrib><creatorcontrib>Rakita, Daniel</creatorcontrib><title>A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy</title><title>arXiv.org</title><description>Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipulation. However, it remains unclear which state-action space is most suitable for this complex learning process. To enable manipulation with dynamic viewpoints and to better understand impacts from different state-action spaces on this policy learning process, we conduct a comparative study on the state-action spaces for policy learning and their impacts on the performance of visuomotor policies that integrate viewpoint selection with manipulation. Specifically, we examine the configuration space of the robotic system, the end-effector space with a dual-arm Inverse Kinematics (IK) solver, and the reduced end-effector space with a look-at IK solver to optimize rotation for viewpoint selection. We also assess variants with different rotation representations. Our results demonstrate that state-action spaces utilizing Euler angles with the look-at IK achieve superior task success rates compared to other spaces. Further analysis suggests that these performance differences are driven by inherent variations in the high-frequency components across different state-action spaces and rotation representations.</description><subject>Cameras</subject><subject>Comparative studies</subject><subject>Configuration management</subject><subject>End effectors</subject><subject>Euler angles</subject><subject>Inverse kinematics</subject><subject>Learning</subject><subject>Representations</subject><subject>Robot learning</subject><subject>Robotic surgery</subject><subject>Robotics</subject><subject>Rotation</subject><subject>Solvers</subject><subject>Task complexity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNiskKAjEQRIMgKOo_NHgeGBOXXMUFDwqC4lWasUdbxiRmUfx71w_wVPWqXk00pVK9TPelbIhOCOc8z-VwJAcD1RR-DBN7cegx8o1gE9PhAda8CkbKxkXkNzgsKEBpPSwJvWFzhB3T3Vk2ETZU0ddDc4AVGnapws9w53iCKZdlCm9c24qLR1vUS6wCdX7ZEt35bDtZZM7ba6IQ92ebvHlde9XLtR5qrfvqP-sJU5VLmQ</recordid><startdate>20241113</startdate><enddate>20241113</enddate><creator>Sun, Xiatao</creator><creator>Fan, Francis</creator><creator>Chen, Yinxing</creator><creator>Rakita, Daniel</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241113</creationdate><title>A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy</title><author>Sun, Xiatao ; Fan, Francis ; Chen, Yinxing ; Rakita, Daniel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31088688843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Cameras</topic><topic>Comparative studies</topic><topic>Configuration management</topic><topic>End effectors</topic><topic>Euler angles</topic><topic>Inverse kinematics</topic><topic>Learning</topic><topic>Representations</topic><topic>Robot learning</topic><topic>Robotic surgery</topic><topic>Robotics</topic><topic>Rotation</topic><topic>Solvers</topic><topic>Task complexity</topic><toplevel>online_resources</toplevel><creatorcontrib>Sun, Xiatao</creatorcontrib><creatorcontrib>Fan, Francis</creatorcontrib><creatorcontrib>Chen, Yinxing</creatorcontrib><creatorcontrib>Rakita, Daniel</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science & Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Xiatao</au><au>Fan, Francis</au><au>Chen, Yinxing</au><au>Rakita, Daniel</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy</atitle><jtitle>arXiv.org</jtitle><date>2024-11-13</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipulation. However, it remains unclear which state-action space is most suitable for this complex learning process. To enable manipulation with dynamic viewpoints and to better understand impacts from different state-action spaces on this policy learning process, we conduct a comparative study on the state-action spaces for policy learning and their impacts on the performance of visuomotor policies that integrate viewpoint selection with manipulation. Specifically, we examine the configuration space of the robotic system, the end-effector space with a dual-arm Inverse Kinematics (IK) solver, and the reduced end-effector space with a look-at IK solver to optimize rotation for viewpoint selection. We also assess variants with different rotation representations. Our results demonstrate that state-action spaces utilizing Euler angles with the look-at IK achieve superior task success rates compared to other spaces. Further analysis suggests that these performance differences are driven by inherent variations in the high-frequency components across different state-action spaces and rotation representations.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record> |
fulltext | fulltext |
identifier | EISSN: 2331-8422 |
ispartof | arXiv.org, 2024-11 |
issn | 2331-8422 |
language | eng |
recordid | cdi_proquest_journals_3108868884 |
source | Free E- Journals |
subjects | Cameras Comparative studies Configuration management End effectors Euler angles Inverse kinematics Learning Representations Robot learning Robotic surgery Robotics Rotation Solvers Task complexity |
title | A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T12%3A09%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=A%20Comparative%20Study%20on%20State-Action%20Spaces%20for%20Learning%20Viewpoint%20Selection%20and%20Manipulation%20with%20Diffusion%20Policy&rft.jtitle=arXiv.org&rft.au=Sun,%20Xiatao&rft.date=2024-11-13&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3108868884%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3108868884&rft_id=info:pmid/&rfr_iscdi=true |