I2P Registration by Learning the Underlying Alignment Feature Space from Pixel-to-Point Similarities

Estimating the relative pose between a camera and a LiDAR holds paramount importance in facilitating complex task execution within multi-agent systems. Nonetheless, current methodologies encounter two primary limitations. First, amid the cross-modal feature extraction, they typically employ separate...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:ACM transactions on multimedia computing communications and applications 2024-12, Vol.20 (12), p.1-21, Article 388
Hauptverfasser: Sun, Yunda, Zhang, Lin, Wang, Zhong, Chen, Yang, Zhao, Shengjie, Zhou, Yicong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 21
container_issue 12
container_start_page 1
container_title ACM transactions on multimedia computing communications and applications
container_volume 20
creator Sun, Yunda
Zhang, Lin
Wang, Zhong
Chen, Yang
Zhao, Shengjie
Zhou, Yicong
description Estimating the relative pose between a camera and a LiDAR holds paramount importance in facilitating complex task execution within multi-agent systems. Nonetheless, current methodologies encounter two primary limitations. First, amid the cross-modal feature extraction, they typically employ separate modal branches to extract cross-modal features from images and point clouds. This approach results in the feature spaces of images and point clouds being misaligned, thereby reducing the robustness of establishing correspondences. Second, due to the scale differences between images and point clouds, one-to-many pixel-point correspondences are inevitably encountered, which will mislead the pose optimization. To address these challenges, we propose a framework named Image-to-Point cloud registration by learning the underlying alignment feature space from Pixel-to-Point SIMimilarities (I2P \({}_{\mathbf{ppsim}}\) ). Central to \(\text{I2P}_{\text{ppsim}}\) is a Shared Feature Alignment Module (SFAM). It is designed under on a coarse-to-fine architecture and uses a weight-sharing network to construct an alignment feature space. Benefiting from SFAM, \(\text{I2P}_{\text{ppsim}}\) can effectively identify the co-view regions between images and point clouds and establish high-reliability 2D-3D correspondences. Moreover, to mitigate the one-to-many correspondence issue, we introduce a similarity maximization strategy termed point-max. This strategy effectively filters out outliers, thereby establishing accurate 2D-3D correspondences. To evaluate the efficacy of our framework, we conduct extensive experiments on KITTI Odometry and Oxford Robotcar. The results corroborate the effectiveness of our framework in improving image-to-point cloud registration. To make our results reproducible, the source codes have been released at https://cslinzhang.github.io/I2P
doi_str_mv 10.1145/3697839
format Article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3697839</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3697839</sourcerecordid><originalsourceid>FETCH-LOGICAL-a136t-322b5003dd8387ba1b6c8330cbc5de9eb28d9ced4e8b76038f6f14ca7af871683</originalsourceid><addsrcrecordid>eNo9kE1LAzEQhoMoWKt495Sbp9XNZpPNHkuxWii4WHte8jFZI_tRkgjuv3dLa0_zDu_DMDwI3ZP0iZCcPVNeFoKWF2hGGCMJF5xdnjMrrtFNCN9pSjnL-QyZdVbhD2hciF5GN_RYjXgD0veub3D8ArzrDfh2PKyL1jV9B33EK5DxxwPe7qUGbP3Q4cr9QpvEIakGNxFb17lWehcdhFt0ZWUb4O4052i3evlcviWb99f1crFJJKE8JjTLFJs-M0ZQUShJFNeC0lQrzQyUoDJhSg0mB6EKnlJhuSW5loW0oiBc0Dl6PN7VfgjBg6333nXSjzVJ64Oc-iRnIh-OpNTdGfov_wCSFV_d</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>I2P Registration by Learning the Underlying Alignment Feature Space from Pixel-to-Point Similarities</title><source>ACM Digital Library</source><creator>Sun, Yunda ; Zhang, Lin ; Wang, Zhong ; Chen, Yang ; Zhao, Shengjie ; Zhou, Yicong</creator><creatorcontrib>Sun, Yunda ; Zhang, Lin ; Wang, Zhong ; Chen, Yang ; Zhao, Shengjie ; Zhou, Yicong</creatorcontrib><description>Estimating the relative pose between a camera and a LiDAR holds paramount importance in facilitating complex task execution within multi-agent systems. Nonetheless, current methodologies encounter two primary limitations. First, amid the cross-modal feature extraction, they typically employ separate modal branches to extract cross-modal features from images and point clouds. This approach results in the feature spaces of images and point clouds being misaligned, thereby reducing the robustness of establishing correspondences. Second, due to the scale differences between images and point clouds, one-to-many pixel-point correspondences are inevitably encountered, which will mislead the pose optimization. To address these challenges, we propose a framework named Image-to-Point cloud registration by learning the underlying alignment feature space from Pixel-to-Point SIMimilarities (I2P \({}_{\mathbf{ppsim}}\) ). Central to \(\text{I2P}_{\text{ppsim}}\) is a Shared Feature Alignment Module (SFAM). It is designed under on a coarse-to-fine architecture and uses a weight-sharing network to construct an alignment feature space. Benefiting from SFAM, \(\text{I2P}_{\text{ppsim}}\) can effectively identify the co-view regions between images and point clouds and establish high-reliability 2D-3D correspondences. Moreover, to mitigate the one-to-many correspondence issue, we introduce a similarity maximization strategy termed point-max. This strategy effectively filters out outliers, thereby establishing accurate 2D-3D correspondences. To evaluate the efficacy of our framework, we conduct extensive experiments on KITTI Odometry and Oxford Robotcar. The results corroborate the effectiveness of our framework in improving image-to-point cloud registration. To make our results reproducible, the source codes have been released at https://cslinzhang.github.io/I2P</description><identifier>ISSN: 1551-6857</identifier><identifier>EISSN: 1551-6865</identifier><identifier>DOI: 10.1145/3697839</identifier><language>eng</language><publisher>New York, NY: ACM</publisher><subject>Computing methodologies ; Vision for robotics</subject><ispartof>ACM transactions on multimedia computing communications and applications, 2024-12, Vol.20 (12), p.1-21, Article 388</ispartof><rights>Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a136t-322b5003dd8387ba1b6c8330cbc5de9eb28d9ced4e8b76038f6f14ca7af871683</cites><orcidid>0000-0002-4487-6384 ; 0000-0002-4301-394X ; 0000-0001-8187-2000 ; 0009-0000-3926-540X ; 0000-0002-6206-526X ; 0000-0002-4360-5523</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3697839$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,776,780,2276,27901,27902,40172,75971</link.rule.ids></links><search><creatorcontrib>Sun, Yunda</creatorcontrib><creatorcontrib>Zhang, Lin</creatorcontrib><creatorcontrib>Wang, Zhong</creatorcontrib><creatorcontrib>Chen, Yang</creatorcontrib><creatorcontrib>Zhao, Shengjie</creatorcontrib><creatorcontrib>Zhou, Yicong</creatorcontrib><title>I2P Registration by Learning the Underlying Alignment Feature Space from Pixel-to-Point Similarities</title><title>ACM transactions on multimedia computing communications and applications</title><addtitle>ACM TOMM</addtitle><description>Estimating the relative pose between a camera and a LiDAR holds paramount importance in facilitating complex task execution within multi-agent systems. Nonetheless, current methodologies encounter two primary limitations. First, amid the cross-modal feature extraction, they typically employ separate modal branches to extract cross-modal features from images and point clouds. This approach results in the feature spaces of images and point clouds being misaligned, thereby reducing the robustness of establishing correspondences. Second, due to the scale differences between images and point clouds, one-to-many pixel-point correspondences are inevitably encountered, which will mislead the pose optimization. To address these challenges, we propose a framework named Image-to-Point cloud registration by learning the underlying alignment feature space from Pixel-to-Point SIMimilarities (I2P \({}_{\mathbf{ppsim}}\) ). Central to \(\text{I2P}_{\text{ppsim}}\) is a Shared Feature Alignment Module (SFAM). It is designed under on a coarse-to-fine architecture and uses a weight-sharing network to construct an alignment feature space. Benefiting from SFAM, \(\text{I2P}_{\text{ppsim}}\) can effectively identify the co-view regions between images and point clouds and establish high-reliability 2D-3D correspondences. Moreover, to mitigate the one-to-many correspondence issue, we introduce a similarity maximization strategy termed point-max. This strategy effectively filters out outliers, thereby establishing accurate 2D-3D correspondences. To evaluate the efficacy of our framework, we conduct extensive experiments on KITTI Odometry and Oxford Robotcar. The results corroborate the effectiveness of our framework in improving image-to-point cloud registration. To make our results reproducible, the source codes have been released at https://cslinzhang.github.io/I2P</description><subject>Computing methodologies</subject><subject>Vision for robotics</subject><issn>1551-6857</issn><issn>1551-6865</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNo9kE1LAzEQhoMoWKt495Sbp9XNZpPNHkuxWii4WHte8jFZI_tRkgjuv3dLa0_zDu_DMDwI3ZP0iZCcPVNeFoKWF2hGGCMJF5xdnjMrrtFNCN9pSjnL-QyZdVbhD2hciF5GN_RYjXgD0veub3D8ArzrDfh2PKyL1jV9B33EK5DxxwPe7qUGbP3Q4cr9QpvEIakGNxFb17lWehcdhFt0ZWUb4O4052i3evlcviWb99f1crFJJKE8JjTLFJs-M0ZQUShJFNeC0lQrzQyUoDJhSg0mB6EKnlJhuSW5loW0oiBc0Dl6PN7VfgjBg6333nXSjzVJ64Oc-iRnIh-OpNTdGfov_wCSFV_d</recordid><startdate>20241231</startdate><enddate>20241231</enddate><creator>Sun, Yunda</creator><creator>Zhang, Lin</creator><creator>Wang, Zhong</creator><creator>Chen, Yang</creator><creator>Zhao, Shengjie</creator><creator>Zhou, Yicong</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0000-0002-4487-6384</orcidid><orcidid>https://orcid.org/0000-0002-4301-394X</orcidid><orcidid>https://orcid.org/0000-0001-8187-2000</orcidid><orcidid>https://orcid.org/0009-0000-3926-540X</orcidid><orcidid>https://orcid.org/0000-0002-6206-526X</orcidid><orcidid>https://orcid.org/0000-0002-4360-5523</orcidid></search><sort><creationdate>20241231</creationdate><title>I2P Registration by Learning the Underlying Alignment Feature Space from Pixel-to-Point Similarities</title><author>Sun, Yunda ; Zhang, Lin ; Wang, Zhong ; Chen, Yang ; Zhao, Shengjie ; Zhou, Yicong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a136t-322b5003dd8387ba1b6c8330cbc5de9eb28d9ced4e8b76038f6f14ca7af871683</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computing methodologies</topic><topic>Vision for robotics</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Sun, Yunda</creatorcontrib><creatorcontrib>Zhang, Lin</creatorcontrib><creatorcontrib>Wang, Zhong</creatorcontrib><creatorcontrib>Chen, Yang</creatorcontrib><creatorcontrib>Zhao, Shengjie</creatorcontrib><creatorcontrib>Zhou, Yicong</creatorcontrib><collection>CrossRef</collection><jtitle>ACM transactions on multimedia computing communications and applications</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Sun, Yunda</au><au>Zhang, Lin</au><au>Wang, Zhong</au><au>Chen, Yang</au><au>Zhao, Shengjie</au><au>Zhou, Yicong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>I2P Registration by Learning the Underlying Alignment Feature Space from Pixel-to-Point Similarities</atitle><jtitle>ACM transactions on multimedia computing communications and applications</jtitle><stitle>ACM TOMM</stitle><date>2024-12-31</date><risdate>2024</risdate><volume>20</volume><issue>12</issue><spage>1</spage><epage>21</epage><pages>1-21</pages><artnum>388</artnum><issn>1551-6857</issn><eissn>1551-6865</eissn><abstract>Estimating the relative pose between a camera and a LiDAR holds paramount importance in facilitating complex task execution within multi-agent systems. Nonetheless, current methodologies encounter two primary limitations. First, amid the cross-modal feature extraction, they typically employ separate modal branches to extract cross-modal features from images and point clouds. This approach results in the feature spaces of images and point clouds being misaligned, thereby reducing the robustness of establishing correspondences. Second, due to the scale differences between images and point clouds, one-to-many pixel-point correspondences are inevitably encountered, which will mislead the pose optimization. To address these challenges, we propose a framework named Image-to-Point cloud registration by learning the underlying alignment feature space from Pixel-to-Point SIMimilarities (I2P \({}_{\mathbf{ppsim}}\) ). Central to \(\text{I2P}_{\text{ppsim}}\) is a Shared Feature Alignment Module (SFAM). It is designed under on a coarse-to-fine architecture and uses a weight-sharing network to construct an alignment feature space. Benefiting from SFAM, \(\text{I2P}_{\text{ppsim}}\) can effectively identify the co-view regions between images and point clouds and establish high-reliability 2D-3D correspondences. Moreover, to mitigate the one-to-many correspondence issue, we introduce a similarity maximization strategy termed point-max. This strategy effectively filters out outliers, thereby establishing accurate 2D-3D correspondences. To evaluate the efficacy of our framework, we conduct extensive experiments on KITTI Odometry and Oxford Robotcar. The results corroborate the effectiveness of our framework in improving image-to-point cloud registration. To make our results reproducible, the source codes have been released at https://cslinzhang.github.io/I2P</abstract><cop>New York, NY</cop><pub>ACM</pub><doi>10.1145/3697839</doi><tpages>21</tpages><orcidid>https://orcid.org/0000-0002-4487-6384</orcidid><orcidid>https://orcid.org/0000-0002-4301-394X</orcidid><orcidid>https://orcid.org/0000-0001-8187-2000</orcidid><orcidid>https://orcid.org/0009-0000-3926-540X</orcidid><orcidid>https://orcid.org/0000-0002-6206-526X</orcidid><orcidid>https://orcid.org/0000-0002-4360-5523</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 1551-6857
ispartof ACM transactions on multimedia computing communications and applications, 2024-12, Vol.20 (12), p.1-21, Article 388
issn 1551-6857
1551-6865
language eng
recordid cdi_crossref_primary_10_1145_3697839
source ACM Digital Library
subjects Computing methodologies
Vision for robotics
title I2P Registration by Learning the Underlying Alignment Feature Space from Pixel-to-Point Similarities
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-03T19%3A58%3A43IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=I2P%20Registration%20by%20Learning%20the%20Underlying%20Alignment%20Feature%20Space%20from%20Pixel-to-Point%20Similarities&rft.jtitle=ACM%20transactions%20on%20multimedia%20computing%20communications%20and%20applications&rft.au=Sun,%20Yunda&rft.date=2024-12-31&rft.volume=20&rft.issue=12&rft.spage=1&rft.epage=21&rft.pages=1-21&rft.artnum=388&rft.issn=1551-6857&rft.eissn=1551-6865&rft_id=info:doi/10.1145/3697839&rft_dat=%3Cacm_cross%3E3697839%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true