A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy

Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipula...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:arXiv.org 2024-11
Hauptverfasser: Sun, Xiatao, Fan, Francis, Chen, Yinxing, Rakita, Daniel
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page
container_issue
container_start_page
container_title arXiv.org
container_volume
creator Sun, Xiatao
Fan, Francis
Chen, Yinxing
Rakita, Daniel
description Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipulation. However, it remains unclear which state-action space is most suitable for this complex learning process. To enable manipulation with dynamic viewpoints and to better understand impacts from different state-action spaces on this policy learning process, we conduct a comparative study on the state-action spaces for policy learning and their impacts on the performance of visuomotor policies that integrate viewpoint selection with manipulation. Specifically, we examine the configuration space of the robotic system, the end-effector space with a dual-arm Inverse Kinematics (IK) solver, and the reduced end-effector space with a look-at IK solver to optimize rotation for viewpoint selection. We also assess variants with different rotation representations. Our results demonstrate that state-action spaces utilizing Euler angles with the look-at IK achieve superior task success rates compared to other spaces. Further analysis suggests that these performance differences are driven by inherent variations in the high-frequency components across different state-action spaces and rotation representations.
format Article
fullrecord <record><control><sourceid>proquest</sourceid><recordid>TN_cdi_proquest_journals_3108868884</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3108868884</sourcerecordid><originalsourceid>FETCH-proquest_journals_31088688843</originalsourceid><addsrcrecordid>eNqNiskKAjEQRIMgKOo_NHgeGBOXXMUFDwqC4lWasUdbxiRmUfx71w_wVPWqXk00pVK9TPelbIhOCOc8z-VwJAcD1RR-DBN7cegx8o1gE9PhAda8CkbKxkXkNzgsKEBpPSwJvWFzhB3T3Vk2ETZU0ddDc4AVGnapws9w53iCKZdlCm9c24qLR1vUS6wCdX7ZEt35bDtZZM7ba6IQ92ebvHlde9XLtR5qrfvqP-sJU5VLmQ</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3108868884</pqid></control><display><type>article</type><title>A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy</title><source>Free E- Journals</source><creator>Sun, Xiatao ; Fan, Francis ; Chen, Yinxing ; Rakita, Daniel</creator><creatorcontrib>Sun, Xiatao ; Fan, Francis ; Chen, Yinxing ; Rakita, Daniel</creatorcontrib><description>Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipulation. However, it remains unclear which state-action space is most suitable for this complex learning process. To enable manipulation with dynamic viewpoints and to better understand impacts from different state-action spaces on this policy learning process, we conduct a comparative study on the state-action spaces for policy learning and their impacts on the performance of visuomotor policies that integrate viewpoint selection with manipulation. Specifically, we examine the configuration space of the robotic system, the end-effector space with a dual-arm Inverse Kinematics (IK) solver, and the reduced end-effector space with a look-at IK solver to optimize rotation for viewpoint selection. We also assess variants with different rotation representations. Our results demonstrate that state-action spaces utilizing Euler angles with the look-at IK achieve superior task success rates compared to other spaces. Further analysis suggests that these performance differences are driven by inherent variations in the high-frequency components across different state-action spaces and rotation representations.</description><identifier>EISSN: 2331-8422</identifier><language>eng</language><publisher>Ithaca: Cornell University Library, arXiv.org</publisher><subject>Cameras ; Comparative studies ; Configuration management ; End effectors ; Euler angles ; Inverse kinematics ; Learning ; Representations ; Robot learning ; Robotic surgery ; Robotics ; Rotation ; Solvers ; Task complexity</subject><ispartof>arXiv.org, 2024-11</ispartof><rights>2024. This work is published under http://arxiv.org/licenses/nonexclusive-distrib/1.0/ (the “License”). Notwithstanding the ProQuest Terms and Conditions, you may use this content in accordance with the terms of the License.</rights><oa>free_for_read</oa><woscitedreferencessubscribed>false</woscitedreferencessubscribed></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><link.rule.ids>780,784</link.rule.ids></links><search><creatorcontrib>Sun, Xiatao</creatorcontrib><creatorcontrib>Fan, Francis</creatorcontrib><creatorcontrib>Chen, Yinxing</creatorcontrib><creatorcontrib>Rakita, Daniel</creatorcontrib><title>A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy</title><title>arXiv.org</title><description>Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipulation. However, it remains unclear which state-action space is most suitable for this complex learning process. To enable manipulation with dynamic viewpoints and to better understand impacts from different state-action spaces on this policy learning process, we conduct a comparative study on the state-action spaces for policy learning and their impacts on the performance of visuomotor policies that integrate viewpoint selection with manipulation. Specifically, we examine the configuration space of the robotic system, the end-effector space with a dual-arm Inverse Kinematics (IK) solver, and the reduced end-effector space with a look-at IK solver to optimize rotation for viewpoint selection. We also assess variants with different rotation representations. Our results demonstrate that state-action spaces utilizing Euler angles with the look-at IK achieve superior task success rates compared to other spaces. Further analysis suggests that these performance differences are driven by inherent variations in the high-frequency components across different state-action spaces and rotation representations.</description><subject>Cameras</subject><subject>Comparative studies</subject><subject>Configuration management</subject><subject>End effectors</subject><subject>Euler angles</subject><subject>Inverse kinematics</subject><subject>Learning</subject><subject>Representations</subject><subject>Robot learning</subject><subject>Robotic surgery</subject><subject>Robotics</subject><subject>Rotation</subject><subject>Solvers</subject><subject>Task complexity</subject><issn>2331-8422</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>ABUWG</sourceid><sourceid>AFKRA</sourceid><sourceid>AZQEC</sourceid><sourceid>BENPR</sourceid><sourceid>CCPQU</sourceid><sourceid>DWQXO</sourceid><recordid>eNqNiskKAjEQRIMgKOo_NHgeGBOXXMUFDwqC4lWasUdbxiRmUfx71w_wVPWqXk00pVK9TPelbIhOCOc8z-VwJAcD1RR-DBN7cegx8o1gE9PhAda8CkbKxkXkNzgsKEBpPSwJvWFzhB3T3Vk2ETZU0ddDc4AVGnapws9w53iCKZdlCm9c24qLR1vUS6wCdX7ZEt35bDtZZM7ba6IQ92ebvHlde9XLtR5qrfvqP-sJU5VLmQ</recordid><startdate>20241113</startdate><enddate>20241113</enddate><creator>Sun, Xiatao</creator><creator>Fan, Francis</creator><creator>Chen, Yinxing</creator><creator>Rakita, Daniel</creator><general>Cornell University Library, arXiv.org</general><scope>8FE</scope><scope>8FG</scope><scope>ABJCF</scope><scope>ABUWG</scope><scope>AFKRA</scope><scope>AZQEC</scope><scope>BENPR</scope><scope>BGLVJ</scope><scope>CCPQU</scope><scope>DWQXO</scope><scope>HCIFZ</scope><scope>L6V</scope><scope>M7S</scope><scope>PIMPY</scope><scope>PQEST</scope><scope>PQQKQ</scope><scope>PQUKI</scope><scope>PRINS</scope><scope>PTHSS</scope></search><sort><creationdate>20241113</creationdate><title>A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy</title><author>Sun, Xiatao ; Fan, Francis ; Chen, Yinxing ; Rakita, Daniel</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-proquest_journals_31088688843</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Cameras</topic><topic>Comparative studies</topic><topic>Configuration management</topic><topic>End effectors</topic><topic>Euler angles</topic><topic>Inverse kinematics</topic><topic>Learning</topic><topic>Representations</topic><topic>Robot learning</topic><topic>Robotic surgery</topic><topic>Robotics</topic><topic>Rotation</topic><topic>Solvers</topic><topic>Task complexity</topic><toplevel>online_resources</toplevel><creatorcontrib>Sun, Xiatao</creatorcontrib><creatorcontrib>Fan, Francis</creatorcontrib><creatorcontrib>Chen, Yinxing</creatorcontrib><creatorcontrib>Rakita, Daniel</creatorcontrib><collection>ProQuest SciTech Collection</collection><collection>ProQuest Technology Collection</collection><collection>Materials Science &amp; Engineering Collection</collection><collection>ProQuest Central (Alumni Edition)</collection><collection>ProQuest Central UK/Ireland</collection><collection>ProQuest Central Essentials</collection><collection>ProQuest Central</collection><collection>Technology Collection</collection><collection>ProQuest One Community College</collection><collection>ProQuest Central Korea</collection><collection>SciTech Premium Collection</collection><collection>ProQuest Engineering Collection</collection><collection>Engineering Database</collection><collection>Access via ProQuest (Open Access)</collection><collection>ProQuest One Academic Eastern Edition (DO NOT USE)</collection><collection>ProQuest One Academic</collection><collection>ProQuest One Academic UKI Edition</collection><collection>ProQuest Central China</collection><collection>Engineering Collection</collection></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Xiatao</au><au>Fan, Francis</au><au>Chen, Yinxing</au><au>Rakita, Daniel</au><format>book</format><genre>document</genre><ristype>GEN</ristype><atitle>A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy</atitle><jtitle>arXiv.org</jtitle><date>2024-11-13</date><risdate>2024</risdate><eissn>2331-8422</eissn><abstract>Robotic manipulation tasks often rely on static cameras for perception, which can limit flexibility, particularly in scenarios like robotic surgery and cluttered environments where mounting static cameras is impractical. Ideally, robots could jointly learn a policy for dynamic viewpoint and manipulation. However, it remains unclear which state-action space is most suitable for this complex learning process. To enable manipulation with dynamic viewpoints and to better understand impacts from different state-action spaces on this policy learning process, we conduct a comparative study on the state-action spaces for policy learning and their impacts on the performance of visuomotor policies that integrate viewpoint selection with manipulation. Specifically, we examine the configuration space of the robotic system, the end-effector space with a dual-arm Inverse Kinematics (IK) solver, and the reduced end-effector space with a look-at IK solver to optimize rotation for viewpoint selection. We also assess variants with different rotation representations. Our results demonstrate that state-action spaces utilizing Euler angles with the look-at IK achieve superior task success rates compared to other spaces. Further analysis suggests that these performance differences are driven by inherent variations in the high-frequency components across different state-action spaces and rotation representations.</abstract><cop>Ithaca</cop><pub>Cornell University Library, arXiv.org</pub><oa>free_for_read</oa></addata></record>
fulltext fulltext
identifier EISSN: 2331-8422
ispartof arXiv.org, 2024-11
issn 2331-8422
language eng
recordid cdi_proquest_journals_3108868884
source Free E- Journals
subjects Cameras
Comparative studies
Configuration management
End effectors
Euler angles
Inverse kinematics
Learning
Representations
Robot learning
Robotic surgery
Robotics
Rotation
Solvers
Task complexity
title A Comparative Study on State-Action Spaces for Learning Viewpoint Selection and Manipulation with Diffusion Policy
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2024-12-20T12%3A09%3A34IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest&rft_val_fmt=info:ofi/fmt:kev:mtx:book&rft.genre=document&rft.atitle=A%20Comparative%20Study%20on%20State-Action%20Spaces%20for%20Learning%20Viewpoint%20Selection%20and%20Manipulation%20with%20Diffusion%20Policy&rft.jtitle=arXiv.org&rft.au=Sun,%20Xiatao&rft.date=2024-11-13&rft.eissn=2331-8422&rft_id=info:doi/&rft_dat=%3Cproquest%3E3108868884%3C/proquest%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3108868884&rft_id=info:pmid/&rfr_iscdi=true