PTM: Torus Masking for 3D Representation Learning Guided by Robust and Trusted Teachers

3D Masked Point Modeling (MPM) typically involves randomly or blockly discarding points or patches and then reconstructing them, offering a promising avenue for exploring geometric representation. By surveying current masking strategies, we have found that random-masked regions are provided with exc...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2024-12, Vol.34 (12), p.12158-12170
Hauptverfasser: Cheng, Haozhe, Zhu, Jihua, Hu, Naiwen, Chen, Jinqian, Yan, Wenbiao
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 12170
container_issue 12
container_start_page 12158
container_title IEEE transactions on circuits and systems for video technology
container_volume 34
creator Cheng, Haozhe
Zhu, Jihua
Hu, Naiwen
Chen, Jinqian
Yan, Wenbiao
description 3D Masked Point Modeling (MPM) typically involves randomly or blockly discarding points or patches and then reconstructing them, offering a promising avenue for exploring geometric representation. By surveying current masking strategies, we have found that random-masked regions are provided with excessive context, reducing modeling difficulty but impeding knowledge transfer. While, block-masked regions lack sufficient guidance, resulting in significant generated noise. To address these issues, we propose PTM, a novel Transformer-style 3D MPM method employing a torus masking strategy. Specifically, a high-density area is chosen as the masked region, forming a torus by retaining small-radius neighborhoods around the center point. To mitigate torus modeling noise, the designed robust teacher model captures density scale to construct noise embedding, utilizing a reverse fit function for reconstruction assistance. Furthermore, the proposed trusted teacher model defines the multi-modal global descriptor as subjective evidence. On a semantic level, we form semi-subjective trusted evidence to guide reconstruction by evaluating the contribution of each subjective evidence to 3D representation. Downstream fine-tuning tasks validate the state-of-the-art performance of PTM in multi-scale point cloud classification and segmentation.
doi_str_mv 10.1109/TCSVT.2024.3430904
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10604912</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10604912</ieee_id><sourcerecordid>3147528722</sourcerecordid><originalsourceid>FETCH-LOGICAL-c177t-f3e1538e8edc8453f5f35b3ed2856128bd5e1de406f5015dcb14169e63eaf0943</originalsourceid><addsrcrecordid>eNpNkD1PwzAQhi0EEqXwBxCDJeYUnz8Shw0VKEhFoBJgtJz4DCmQFDsZ-u9JaQemO929H9JDyCmwCQDLL4rp82sx4YzLiZCC5UzukREopRPOmdofdqYg0RzUITmKcckYSC2zEXl7Kh4uadGGPtIHGz_r5p36NlBxTRe4Chix6WxXtw2dow3N5j3ra4eOlmu6aMs-dtQ2jhZDQDdcC7TVB4Z4TA68_Yp4sptj8nJ7U0zvkvnj7H56NU8qyLIu8QJBCY0aXaWlEl55oUqBjmuVAtelUwgOJUu9YqBcVYKENMdUoPUsl2JMzre5q9D-9Bg7s2z70AyVRoDMFNcZ54OKb1VVaGMM6M0q1N82rA0wswFo_gCaDUCzAziYzramGhH_GVImc-DiF7Kwa8Y</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3147528722</pqid></control><display><type>article</type><title>PTM: Torus Masking for 3D Representation Learning Guided by Robust and Trusted Teachers</title><source>IEEE Electronic Library (IEL)</source><creator>Cheng, Haozhe ; Zhu, Jihua ; Hu, Naiwen ; Chen, Jinqian ; Yan, Wenbiao</creator><creatorcontrib>Cheng, Haozhe ; Zhu, Jihua ; Hu, Naiwen ; Chen, Jinqian ; Yan, Wenbiao</creatorcontrib><description>3D Masked Point Modeling (MPM) typically involves randomly or blockly discarding points or patches and then reconstructing them, offering a promising avenue for exploring geometric representation. By surveying current masking strategies, we have found that random-masked regions are provided with excessive context, reducing modeling difficulty but impeding knowledge transfer. While, block-masked regions lack sufficient guidance, resulting in significant generated noise. To address these issues, we propose PTM, a novel Transformer-style 3D MPM method employing a torus masking strategy. Specifically, a high-density area is chosen as the masked region, forming a torus by retaining small-radius neighborhoods around the center point. To mitigate torus modeling noise, the designed robust teacher model captures density scale to construct noise embedding, utilizing a reverse fit function for reconstruction assistance. Furthermore, the proposed trusted teacher model defines the multi-modal global descriptor as subjective evidence. On a semantic level, we form semi-subjective trusted evidence to guide reconstruction by evaluating the contribution of each subjective evidence to 3D representation. Downstream fine-tuning tasks validate the state-of-the-art performance of PTM in multi-scale point cloud classification and segmentation.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2024.3430904</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>3D point cloud ; Deep learning ; evidence deep learning ; Image reconstruction ; Image segmentation ; Knowledge management ; Masking ; Modelling ; Noise measurement ; Point cloud compression ; Reconstruction ; representation learning ; Representations ; Residential density ; Robustness ; Self-supervised learning ; self-supervised network ; Solid modeling ; Three dimensional models ; Three-dimensional displays ; Toruses ; Uncertainty</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2024-12, Vol.34 (12), p.12158-12170</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c177t-f3e1538e8edc8453f5f35b3ed2856128bd5e1de406f5015dcb14169e63eaf0943</cites><orcidid>0009-0001-2048-3375 ; 0000-0001-9884-9923 ; 0000-0002-3081-8781 ; 0000-0002-8723-9924 ; 0000-0001-6295-7850</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10604912$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10604912$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Cheng, Haozhe</creatorcontrib><creatorcontrib>Zhu, Jihua</creatorcontrib><creatorcontrib>Hu, Naiwen</creatorcontrib><creatorcontrib>Chen, Jinqian</creatorcontrib><creatorcontrib>Yan, Wenbiao</creatorcontrib><title>PTM: Torus Masking for 3D Representation Learning Guided by Robust and Trusted Teachers</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>3D Masked Point Modeling (MPM) typically involves randomly or blockly discarding points or patches and then reconstructing them, offering a promising avenue for exploring geometric representation. By surveying current masking strategies, we have found that random-masked regions are provided with excessive context, reducing modeling difficulty but impeding knowledge transfer. While, block-masked regions lack sufficient guidance, resulting in significant generated noise. To address these issues, we propose PTM, a novel Transformer-style 3D MPM method employing a torus masking strategy. Specifically, a high-density area is chosen as the masked region, forming a torus by retaining small-radius neighborhoods around the center point. To mitigate torus modeling noise, the designed robust teacher model captures density scale to construct noise embedding, utilizing a reverse fit function for reconstruction assistance. Furthermore, the proposed trusted teacher model defines the multi-modal global descriptor as subjective evidence. On a semantic level, we form semi-subjective trusted evidence to guide reconstruction by evaluating the contribution of each subjective evidence to 3D representation. Downstream fine-tuning tasks validate the state-of-the-art performance of PTM in multi-scale point cloud classification and segmentation.</description><subject>3D point cloud</subject><subject>Deep learning</subject><subject>evidence deep learning</subject><subject>Image reconstruction</subject><subject>Image segmentation</subject><subject>Knowledge management</subject><subject>Masking</subject><subject>Modelling</subject><subject>Noise measurement</subject><subject>Point cloud compression</subject><subject>Reconstruction</subject><subject>representation learning</subject><subject>Representations</subject><subject>Residential density</subject><subject>Robustness</subject><subject>Self-supervised learning</subject><subject>self-supervised network</subject><subject>Solid modeling</subject><subject>Three dimensional models</subject><subject>Three-dimensional displays</subject><subject>Toruses</subject><subject>Uncertainty</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkD1PwzAQhi0EEqXwBxCDJeYUnz8Shw0VKEhFoBJgtJz4DCmQFDsZ-u9JaQemO929H9JDyCmwCQDLL4rp82sx4YzLiZCC5UzukREopRPOmdofdqYg0RzUITmKcckYSC2zEXl7Kh4uadGGPtIHGz_r5p36NlBxTRe4Chix6WxXtw2dow3N5j3ra4eOlmu6aMs-dtQ2jhZDQDdcC7TVB4Z4TA68_Yp4sptj8nJ7U0zvkvnj7H56NU8qyLIu8QJBCY0aXaWlEl55oUqBjmuVAtelUwgOJUu9YqBcVYKENMdUoPUsl2JMzre5q9D-9Bg7s2z70AyVRoDMFNcZ54OKb1VVaGMM6M0q1N82rA0wswFo_gCaDUCzAziYzramGhH_GVImc-DiF7Kwa8Y</recordid><startdate>20241201</startdate><enddate>20241201</enddate><creator>Cheng, Haozhe</creator><creator>Zhu, Jihua</creator><creator>Hu, Naiwen</creator><creator>Chen, Jinqian</creator><creator>Yan, Wenbiao</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0009-0001-2048-3375</orcidid><orcidid>https://orcid.org/0000-0001-9884-9923</orcidid><orcidid>https://orcid.org/0000-0002-3081-8781</orcidid><orcidid>https://orcid.org/0000-0002-8723-9924</orcidid><orcidid>https://orcid.org/0000-0001-6295-7850</orcidid></search><sort><creationdate>20241201</creationdate><title>PTM: Torus Masking for 3D Representation Learning Guided by Robust and Trusted Teachers</title><author>Cheng, Haozhe ; Zhu, Jihua ; Hu, Naiwen ; Chen, Jinqian ; Yan, Wenbiao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c177t-f3e1538e8edc8453f5f35b3ed2856128bd5e1de406f5015dcb14169e63eaf0943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>3D point cloud</topic><topic>Deep learning</topic><topic>evidence deep learning</topic><topic>Image reconstruction</topic><topic>Image segmentation</topic><topic>Knowledge management</topic><topic>Masking</topic><topic>Modelling</topic><topic>Noise measurement</topic><topic>Point cloud compression</topic><topic>Reconstruction</topic><topic>representation learning</topic><topic>Representations</topic><topic>Residential density</topic><topic>Robustness</topic><topic>Self-supervised learning</topic><topic>self-supervised network</topic><topic>Solid modeling</topic><topic>Three dimensional models</topic><topic>Three-dimensional displays</topic><topic>Toruses</topic><topic>Uncertainty</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cheng, Haozhe</creatorcontrib><creatorcontrib>Zhu, Jihua</creatorcontrib><creatorcontrib>Hu, Naiwen</creatorcontrib><creatorcontrib>Chen, Jinqian</creatorcontrib><creatorcontrib>Yan, Wenbiao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics &amp; Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts – Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cheng, Haozhe</au><au>Zhu, Jihua</au><au>Hu, Naiwen</au><au>Chen, Jinqian</au><au>Yan, Wenbiao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PTM: Torus Masking for 3D Representation Learning Guided by Robust and Trusted Teachers</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2024-12-01</date><risdate>2024</risdate><volume>34</volume><issue>12</issue><spage>12158</spage><epage>12170</epage><pages>12158-12170</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>3D Masked Point Modeling (MPM) typically involves randomly or blockly discarding points or patches and then reconstructing them, offering a promising avenue for exploring geometric representation. By surveying current masking strategies, we have found that random-masked regions are provided with excessive context, reducing modeling difficulty but impeding knowledge transfer. While, block-masked regions lack sufficient guidance, resulting in significant generated noise. To address these issues, we propose PTM, a novel Transformer-style 3D MPM method employing a torus masking strategy. Specifically, a high-density area is chosen as the masked region, forming a torus by retaining small-radius neighborhoods around the center point. To mitigate torus modeling noise, the designed robust teacher model captures density scale to construct noise embedding, utilizing a reverse fit function for reconstruction assistance. Furthermore, the proposed trusted teacher model defines the multi-modal global descriptor as subjective evidence. On a semantic level, we form semi-subjective trusted evidence to guide reconstruction by evaluating the contribution of each subjective evidence to 3D representation. Downstream fine-tuning tasks validate the state-of-the-art performance of PTM in multi-scale point cloud classification and segmentation.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2024.3430904</doi><tpages>13</tpages><orcidid>https://orcid.org/0009-0001-2048-3375</orcidid><orcidid>https://orcid.org/0000-0001-9884-9923</orcidid><orcidid>https://orcid.org/0000-0002-3081-8781</orcidid><orcidid>https://orcid.org/0000-0002-8723-9924</orcidid><orcidid>https://orcid.org/0000-0001-6295-7850</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 1051-8215
ispartof IEEE transactions on circuits and systems for video technology, 2024-12, Vol.34 (12), p.12158-12170
issn 1051-8215
1558-2205
language eng
recordid cdi_ieee_primary_10604912
source IEEE Electronic Library (IEL)
subjects 3D point cloud
Deep learning
evidence deep learning
Image reconstruction
Image segmentation
Knowledge management
Masking
Modelling
Noise measurement
Point cloud compression
Reconstruction
representation learning
Representations
Residential density
Robustness
Self-supervised learning
self-supervised network
Solid modeling
Three dimensional models
Three-dimensional displays
Toruses
Uncertainty
title PTM: Torus Masking for 3D Representation Learning Guided by Robust and Trusted Teachers
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T09%3A25%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PTM:%20Torus%20Masking%20for%203D%20Representation%20Learning%20Guided%20by%20Robust%20and%20Trusted%20Teachers&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Cheng,%20Haozhe&rft.date=2024-12-01&rft.volume=34&rft.issue=12&rft.spage=12158&rft.epage=12170&rft.pages=12158-12170&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2024.3430904&rft_dat=%3Cproquest_RIE%3E3147528722%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3147528722&rft_id=info:pmid/&rft_ieee_id=10604912&rfr_iscdi=true