PTM: Torus Masking for 3D Representation Learning Guided by Robust and Trusted Teachers
3D Masked Point Modeling (MPM) typically involves randomly or blockly discarding points or patches and then reconstructing them, offering a promising avenue for exploring geometric representation. By surveying current masking strategies, we have found that random-masked regions are provided with exc...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2024-12, Vol.34 (12), p.12158-12170 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 12170 |
---|---|
container_issue | 12 |
container_start_page | 12158 |
container_title | IEEE transactions on circuits and systems for video technology |
container_volume | 34 |
creator | Cheng, Haozhe Zhu, Jihua Hu, Naiwen Chen, Jinqian Yan, Wenbiao |
description | 3D Masked Point Modeling (MPM) typically involves randomly or blockly discarding points or patches and then reconstructing them, offering a promising avenue for exploring geometric representation. By surveying current masking strategies, we have found that random-masked regions are provided with excessive context, reducing modeling difficulty but impeding knowledge transfer. While, block-masked regions lack sufficient guidance, resulting in significant generated noise. To address these issues, we propose PTM, a novel Transformer-style 3D MPM method employing a torus masking strategy. Specifically, a high-density area is chosen as the masked region, forming a torus by retaining small-radius neighborhoods around the center point. To mitigate torus modeling noise, the designed robust teacher model captures density scale to construct noise embedding, utilizing a reverse fit function for reconstruction assistance. Furthermore, the proposed trusted teacher model defines the multi-modal global descriptor as subjective evidence. On a semantic level, we form semi-subjective trusted evidence to guide reconstruction by evaluating the contribution of each subjective evidence to 3D representation. Downstream fine-tuning tasks validate the state-of-the-art performance of PTM in multi-scale point cloud classification and segmentation. |
doi_str_mv | 10.1109/TCSVT.2024.3430904 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_ieee_primary_10604912</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10604912</ieee_id><sourcerecordid>3147528722</sourcerecordid><originalsourceid>FETCH-LOGICAL-c177t-f3e1538e8edc8453f5f35b3ed2856128bd5e1de406f5015dcb14169e63eaf0943</originalsourceid><addsrcrecordid>eNpNkD1PwzAQhi0EEqXwBxCDJeYUnz8Shw0VKEhFoBJgtJz4DCmQFDsZ-u9JaQemO929H9JDyCmwCQDLL4rp82sx4YzLiZCC5UzukREopRPOmdofdqYg0RzUITmKcckYSC2zEXl7Kh4uadGGPtIHGz_r5p36NlBxTRe4Chix6WxXtw2dow3N5j3ra4eOlmu6aMs-dtQ2jhZDQDdcC7TVB4Z4TA68_Yp4sptj8nJ7U0zvkvnj7H56NU8qyLIu8QJBCY0aXaWlEl55oUqBjmuVAtelUwgOJUu9YqBcVYKENMdUoPUsl2JMzre5q9D-9Bg7s2z70AyVRoDMFNcZ54OKb1VVaGMM6M0q1N82rA0wswFo_gCaDUCzAziYzramGhH_GVImc-DiF7Kwa8Y</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>3147528722</pqid></control><display><type>article</type><title>PTM: Torus Masking for 3D Representation Learning Guided by Robust and Trusted Teachers</title><source>IEEE Electronic Library (IEL)</source><creator>Cheng, Haozhe ; Zhu, Jihua ; Hu, Naiwen ; Chen, Jinqian ; Yan, Wenbiao</creator><creatorcontrib>Cheng, Haozhe ; Zhu, Jihua ; Hu, Naiwen ; Chen, Jinqian ; Yan, Wenbiao</creatorcontrib><description>3D Masked Point Modeling (MPM) typically involves randomly or blockly discarding points or patches and then reconstructing them, offering a promising avenue for exploring geometric representation. By surveying current masking strategies, we have found that random-masked regions are provided with excessive context, reducing modeling difficulty but impeding knowledge transfer. While, block-masked regions lack sufficient guidance, resulting in significant generated noise. To address these issues, we propose PTM, a novel Transformer-style 3D MPM method employing a torus masking strategy. Specifically, a high-density area is chosen as the masked region, forming a torus by retaining small-radius neighborhoods around the center point. To mitigate torus modeling noise, the designed robust teacher model captures density scale to construct noise embedding, utilizing a reverse fit function for reconstruction assistance. Furthermore, the proposed trusted teacher model defines the multi-modal global descriptor as subjective evidence. On a semantic level, we form semi-subjective trusted evidence to guide reconstruction by evaluating the contribution of each subjective evidence to 3D representation. Downstream fine-tuning tasks validate the state-of-the-art performance of PTM in multi-scale point cloud classification and segmentation.</description><identifier>ISSN: 1051-8215</identifier><identifier>EISSN: 1558-2205</identifier><identifier>DOI: 10.1109/TCSVT.2024.3430904</identifier><identifier>CODEN: ITCTEM</identifier><language>eng</language><publisher>New York: IEEE</publisher><subject>3D point cloud ; Deep learning ; evidence deep learning ; Image reconstruction ; Image segmentation ; Knowledge management ; Masking ; Modelling ; Noise measurement ; Point cloud compression ; Reconstruction ; representation learning ; Representations ; Residential density ; Robustness ; Self-supervised learning ; self-supervised network ; Solid modeling ; Three dimensional models ; Three-dimensional displays ; Toruses ; Uncertainty</subject><ispartof>IEEE transactions on circuits and systems for video technology, 2024-12, Vol.34 (12), p.12158-12170</ispartof><rights>Copyright The Institute of Electrical and Electronics Engineers, Inc. (IEEE) 2024</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c177t-f3e1538e8edc8453f5f35b3ed2856128bd5e1de406f5015dcb14169e63eaf0943</cites><orcidid>0009-0001-2048-3375 ; 0000-0001-9884-9923 ; 0000-0002-3081-8781 ; 0000-0002-8723-9924 ; 0000-0001-6295-7850</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10604912$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10604912$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc></links><search><creatorcontrib>Cheng, Haozhe</creatorcontrib><creatorcontrib>Zhu, Jihua</creatorcontrib><creatorcontrib>Hu, Naiwen</creatorcontrib><creatorcontrib>Chen, Jinqian</creatorcontrib><creatorcontrib>Yan, Wenbiao</creatorcontrib><title>PTM: Torus Masking for 3D Representation Learning Guided by Robust and Trusted Teachers</title><title>IEEE transactions on circuits and systems for video technology</title><addtitle>TCSVT</addtitle><description>3D Masked Point Modeling (MPM) typically involves randomly or blockly discarding points or patches and then reconstructing them, offering a promising avenue for exploring geometric representation. By surveying current masking strategies, we have found that random-masked regions are provided with excessive context, reducing modeling difficulty but impeding knowledge transfer. While, block-masked regions lack sufficient guidance, resulting in significant generated noise. To address these issues, we propose PTM, a novel Transformer-style 3D MPM method employing a torus masking strategy. Specifically, a high-density area is chosen as the masked region, forming a torus by retaining small-radius neighborhoods around the center point. To mitigate torus modeling noise, the designed robust teacher model captures density scale to construct noise embedding, utilizing a reverse fit function for reconstruction assistance. Furthermore, the proposed trusted teacher model defines the multi-modal global descriptor as subjective evidence. On a semantic level, we form semi-subjective trusted evidence to guide reconstruction by evaluating the contribution of each subjective evidence to 3D representation. Downstream fine-tuning tasks validate the state-of-the-art performance of PTM in multi-scale point cloud classification and segmentation.</description><subject>3D point cloud</subject><subject>Deep learning</subject><subject>evidence deep learning</subject><subject>Image reconstruction</subject><subject>Image segmentation</subject><subject>Knowledge management</subject><subject>Masking</subject><subject>Modelling</subject><subject>Noise measurement</subject><subject>Point cloud compression</subject><subject>Reconstruction</subject><subject>representation learning</subject><subject>Representations</subject><subject>Residential density</subject><subject>Robustness</subject><subject>Self-supervised learning</subject><subject>self-supervised network</subject><subject>Solid modeling</subject><subject>Three dimensional models</subject><subject>Three-dimensional displays</subject><subject>Toruses</subject><subject>Uncertainty</subject><issn>1051-8215</issn><issn>1558-2205</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkD1PwzAQhi0EEqXwBxCDJeYUnz8Shw0VKEhFoBJgtJz4DCmQFDsZ-u9JaQemO929H9JDyCmwCQDLL4rp82sx4YzLiZCC5UzukREopRPOmdofdqYg0RzUITmKcckYSC2zEXl7Kh4uadGGPtIHGz_r5p36NlBxTRe4Chix6WxXtw2dow3N5j3ra4eOlmu6aMs-dtQ2jhZDQDdcC7TVB4Z4TA68_Yp4sptj8nJ7U0zvkvnj7H56NU8qyLIu8QJBCY0aXaWlEl55oUqBjmuVAtelUwgOJUu9YqBcVYKENMdUoPUsl2JMzre5q9D-9Bg7s2z70AyVRoDMFNcZ54OKb1VVaGMM6M0q1N82rA0wswFo_gCaDUCzAziYzramGhH_GVImc-DiF7Kwa8Y</recordid><startdate>20241201</startdate><enddate>20241201</enddate><creator>Cheng, Haozhe</creator><creator>Zhu, Jihua</creator><creator>Hu, Naiwen</creator><creator>Chen, Jinqian</creator><creator>Yan, Wenbiao</creator><general>IEEE</general><general>The Institute of Electrical and Electronics Engineers, Inc. (IEEE)</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7SC</scope><scope>7SP</scope><scope>8FD</scope><scope>JQ2</scope><scope>L7M</scope><scope>L~C</scope><scope>L~D</scope><orcidid>https://orcid.org/0009-0001-2048-3375</orcidid><orcidid>https://orcid.org/0000-0001-9884-9923</orcidid><orcidid>https://orcid.org/0000-0002-3081-8781</orcidid><orcidid>https://orcid.org/0000-0002-8723-9924</orcidid><orcidid>https://orcid.org/0000-0001-6295-7850</orcidid></search><sort><creationdate>20241201</creationdate><title>PTM: Torus Masking for 3D Representation Learning Guided by Robust and Trusted Teachers</title><author>Cheng, Haozhe ; Zhu, Jihua ; Hu, Naiwen ; Chen, Jinqian ; Yan, Wenbiao</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c177t-f3e1538e8edc8453f5f35b3ed2856128bd5e1de406f5015dcb14169e63eaf0943</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>3D point cloud</topic><topic>Deep learning</topic><topic>evidence deep learning</topic><topic>Image reconstruction</topic><topic>Image segmentation</topic><topic>Knowledge management</topic><topic>Masking</topic><topic>Modelling</topic><topic>Noise measurement</topic><topic>Point cloud compression</topic><topic>Reconstruction</topic><topic>representation learning</topic><topic>Representations</topic><topic>Residential density</topic><topic>Robustness</topic><topic>Self-supervised learning</topic><topic>self-supervised network</topic><topic>Solid modeling</topic><topic>Three dimensional models</topic><topic>Three-dimensional displays</topic><topic>Toruses</topic><topic>Uncertainty</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Cheng, Haozhe</creatorcontrib><creatorcontrib>Zhu, Jihua</creatorcontrib><creatorcontrib>Hu, Naiwen</creatorcontrib><creatorcontrib>Chen, Jinqian</creatorcontrib><creatorcontrib>Yan, Wenbiao</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>CrossRef</collection><collection>Computer and Information Systems Abstracts</collection><collection>Electronics & Communications Abstracts</collection><collection>Technology Research Database</collection><collection>ProQuest Computer Science Collection</collection><collection>Advanced Technologies Database with Aerospace</collection><collection>Computer and Information Systems Abstracts Academic</collection><collection>Computer and Information Systems Abstracts Professional</collection><jtitle>IEEE transactions on circuits and systems for video technology</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Cheng, Haozhe</au><au>Zhu, Jihua</au><au>Hu, Naiwen</au><au>Chen, Jinqian</au><au>Yan, Wenbiao</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>PTM: Torus Masking for 3D Representation Learning Guided by Robust and Trusted Teachers</atitle><jtitle>IEEE transactions on circuits and systems for video technology</jtitle><stitle>TCSVT</stitle><date>2024-12-01</date><risdate>2024</risdate><volume>34</volume><issue>12</issue><spage>12158</spage><epage>12170</epage><pages>12158-12170</pages><issn>1051-8215</issn><eissn>1558-2205</eissn><coden>ITCTEM</coden><abstract>3D Masked Point Modeling (MPM) typically involves randomly or blockly discarding points or patches and then reconstructing them, offering a promising avenue for exploring geometric representation. By surveying current masking strategies, we have found that random-masked regions are provided with excessive context, reducing modeling difficulty but impeding knowledge transfer. While, block-masked regions lack sufficient guidance, resulting in significant generated noise. To address these issues, we propose PTM, a novel Transformer-style 3D MPM method employing a torus masking strategy. Specifically, a high-density area is chosen as the masked region, forming a torus by retaining small-radius neighborhoods around the center point. To mitigate torus modeling noise, the designed robust teacher model captures density scale to construct noise embedding, utilizing a reverse fit function for reconstruction assistance. Furthermore, the proposed trusted teacher model defines the multi-modal global descriptor as subjective evidence. On a semantic level, we form semi-subjective trusted evidence to guide reconstruction by evaluating the contribution of each subjective evidence to 3D representation. Downstream fine-tuning tasks validate the state-of-the-art performance of PTM in multi-scale point cloud classification and segmentation.</abstract><cop>New York</cop><pub>IEEE</pub><doi>10.1109/TCSVT.2024.3430904</doi><tpages>13</tpages><orcidid>https://orcid.org/0009-0001-2048-3375</orcidid><orcidid>https://orcid.org/0000-0001-9884-9923</orcidid><orcidid>https://orcid.org/0000-0002-3081-8781</orcidid><orcidid>https://orcid.org/0000-0002-8723-9924</orcidid><orcidid>https://orcid.org/0000-0001-6295-7850</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 1051-8215 |
ispartof | IEEE transactions on circuits and systems for video technology, 2024-12, Vol.34 (12), p.12158-12170 |
issn | 1051-8215 1558-2205 |
language | eng |
recordid | cdi_ieee_primary_10604912 |
source | IEEE Electronic Library (IEL) |
subjects | 3D point cloud Deep learning evidence deep learning Image reconstruction Image segmentation Knowledge management Masking Modelling Noise measurement Point cloud compression Reconstruction representation learning Representations Residential density Robustness Self-supervised learning self-supervised network Solid modeling Three dimensional models Three-dimensional displays Toruses Uncertainty |
title | PTM: Torus Masking for 3D Representation Learning Guided by Robust and Trusted Teachers |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-10T09%3A25%3A01IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=PTM:%20Torus%20Masking%20for%203D%20Representation%20Learning%20Guided%20by%20Robust%20and%20Trusted%20Teachers&rft.jtitle=IEEE%20transactions%20on%20circuits%20and%20systems%20for%20video%20technology&rft.au=Cheng,%20Haozhe&rft.date=2024-12-01&rft.volume=34&rft.issue=12&rft.spage=12158&rft.epage=12170&rft.pages=12158-12170&rft.issn=1051-8215&rft.eissn=1558-2205&rft.coden=ITCTEM&rft_id=info:doi/10.1109/TCSVT.2024.3430904&rft_dat=%3Cproquest_RIE%3E3147528722%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=3147528722&rft_id=info:pmid/&rft_ieee_id=10604912&rfr_iscdi=true |