Beyond Pattern Variance: Unsupervised 3-D Action Representation Learning With Point Cloud Sequence

This work pays the first research effort to address unsupervised 3-D action representation learning with point cloud sequence, which is different from existing unsupervised methods that rely on 3-D skeleton information. Our proposition is built on the state-of-the-art 3-D action descriptor 3-D dynam...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2024-12, Vol.35 (12), p.18186-18199
Hauptverfasser: Tan, Bo, Xiao, Yang, Wang, Yancheng, Li, Shuai, Yang, Jianyu, Cao, Zhiguo, Zhou, Joey Tianyi, Yuan, Junsong
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 18199
container_issue 12
container_start_page 18186
container_title IEEE transaction on neural networks and learning systems
container_volume 35
creator Tan, Bo
Xiao, Yang
Wang, Yancheng
Li, Shuai
Yang, Jianyu
Cao, Zhiguo
Zhou, Joey Tianyi
Yuan, Junsong
description This work pays the first research effort to address unsupervised 3-D action representation learning with point cloud sequence, which is different from existing unsupervised methods that rely on 3-D skeleton information. Our proposition is built on the state-of-the-art 3-D action descriptor 3-D dynamic voxel (3DV) with contrastive learning (CL). The 3DV can compress the point cloud sequence into a compact point cloud of 3-D motion information. Spatiotemporal data augmentations are conducted on it to drive CL. However, we find that existing CL methods (e.g., SimCLR or MoCo v2) often suffer from high pattern variance toward the augmented 3DV samples from the same action instance, that is, the augmented 3DV samples are still of high feature complementarity after CL, while the complementary discriminative clues within them have not been well exploited yet. To address this, a feature augmentation adapted CL (FACL) approach is proposed, which facilitates 3-D action representation via concerning the features from all augmented 3DV samples jointly, in spirit of feature augmentation. FACL runs in a global-local way: one branch learns global feature that involves the discriminative clues from the raw and augmented 3DV samples, and the other focuses on enhancing the discriminative power of local feature learned from each augmented 3DV sample. The global and local features are fused to characterize 3-D action jointly via concatenation. To fit FACL, a series of spatiotemporal data augmentation approaches is also studied on 3DV. Wide-range experiments verify the superiority of our unsupervised learning method for 3-D action feature learning. It outperforms the state-of-the-art skeleton-based counterparts by 6.4% and 3.6% with the cross-setup and cross-subject test settings on NTU RGB+D 120, respectively. The source code is available at https://github.com/tangent-T/FACL .
doi_str_mv 10.1109/TNNLS.2023.3312673
format Article
fullrecord <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TNNLS_2023_3312673</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10256675</ieee_id><sourcerecordid>2867152843</sourcerecordid><originalsourceid>FETCH-LOGICAL-c275t-8c3f2166b025157a938d062a40a70bc997bdabe5343cf71fdde8e1bbd4d8f7af3</originalsourceid><addsrcrecordid>eNpNkMtOwzAQRS0Eoqj0BxBCXrJJ8SO2E3ZQnlIFFe9d5MQTMEqdYCdI_XtSWipmMzPSvXdGB6EDSsaUkvTk6e5u-jhmhPEx55RJxbfQHqOSRYwnyfZmVm8DNArhk_QliZBxuosGXCmWCin2UH4Oi9oZPNNtC97hF-2tdgWc4mcXugb8tw1gMI8u8FnR2trhB2g8BHCt_l2noL2z7h2_2vYDz2rrWjyp6s7gR_jqoI_aRzulrgKM1n2Inq8unyY30fT--nZyNo0KpkQbJQUv-59lTpigQumUJ4ZIpmOiFcmLNFW50TkIHvOiVLQ0BhKgeW5ik5RKl3yIjle5ja_7y6HN5jYUUFXaQd2FjCVSUcGSmPdStpIWvg7BQ5k13s61X2SUZEu82S_ebIk3W-PtTUfr_C6fg9lY_mD2gsOVwALAv0QmpFSC_wCWyn-Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2867152843</pqid></control><display><type>article</type><title>Beyond Pattern Variance: Unsupervised 3-D Action Representation Learning With Point Cloud Sequence</title><source>IEEE Electronic Library (IEL)</source><creator>Tan, Bo ; Xiao, Yang ; Wang, Yancheng ; Li, Shuai ; Yang, Jianyu ; Cao, Zhiguo ; Zhou, Joey Tianyi ; Yuan, Junsong</creator><creatorcontrib>Tan, Bo ; Xiao, Yang ; Wang, Yancheng ; Li, Shuai ; Yang, Jianyu ; Cao, Zhiguo ; Zhou, Joey Tianyi ; Yuan, Junsong</creatorcontrib><description>This work pays the first research effort to address unsupervised 3-D action representation learning with point cloud sequence, which is different from existing unsupervised methods that rely on 3-D skeleton information. Our proposition is built on the state-of-the-art 3-D action descriptor 3-D dynamic voxel (3DV) with contrastive learning (CL). The 3DV can compress the point cloud sequence into a compact point cloud of 3-D motion information. Spatiotemporal data augmentations are conducted on it to drive CL. However, we find that existing CL methods (e.g., SimCLR or MoCo v2) often suffer from high pattern variance toward the augmented 3DV samples from the same action instance, that is, the augmented 3DV samples are still of high feature complementarity after CL, while the complementary discriminative clues within them have not been well exploited yet. To address this, a feature augmentation adapted CL (FACL) approach is proposed, which facilitates 3-D action representation via concerning the features from all augmented 3DV samples jointly, in spirit of feature augmentation. FACL runs in a global-local way: one branch learns global feature that involves the discriminative clues from the raw and augmented 3DV samples, and the other focuses on enhancing the discriminative power of local feature learned from each augmented 3DV sample. The global and local features are fused to characterize 3-D action jointly via concatenation. To fit FACL, a series of spatiotemporal data augmentation approaches is also studied on 3DV. Wide-range experiments verify the superiority of our unsupervised learning method for 3-D action feature learning. It outperforms the state-of-the-art skeleton-based counterparts by 6.4% and 3.6% with the cross-setup and cross-subject test settings on NTU RGB+D 120, respectively. The source code is available at https://github.com/tangent-T/FACL .</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2023.3312673</identifier><identifier>PMID: 37729565</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Contrastive learning ; Contrastive learning (CL) ; Data augmentation ; feature augmentation ; Point cloud compression ; point cloud sequence ; Representation learning ; Skeleton ; Spatiotemporal phenomena ; Three-dimensional displays ; Training ; unsupervised 3-D action representation learning</subject><ispartof>IEEE transaction on neural networks and learning systems, 2024-12, Vol.35 (12), p.18186-18199</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c275t-8c3f2166b025157a938d062a40a70bc997bdabe5343cf71fdde8e1bbd4d8f7af3</cites><orcidid>0000-0002-7739-4146 ; 0000-0002-7324-7034 ; 0000-0002-0208-221X ; 0009-0000-4633-6026 ; 0000-0002-4675-7055 ; 0009-0008-1009-2195 ; 0000-0002-9223-1863</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10256675$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10256675$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37729565$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Tan, Bo</creatorcontrib><creatorcontrib>Xiao, Yang</creatorcontrib><creatorcontrib>Wang, Yancheng</creatorcontrib><creatorcontrib>Li, Shuai</creatorcontrib><creatorcontrib>Yang, Jianyu</creatorcontrib><creatorcontrib>Cao, Zhiguo</creatorcontrib><creatorcontrib>Zhou, Joey Tianyi</creatorcontrib><creatorcontrib>Yuan, Junsong</creatorcontrib><title>Beyond Pattern Variance: Unsupervised 3-D Action Representation Learning With Point Cloud Sequence</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>This work pays the first research effort to address unsupervised 3-D action representation learning with point cloud sequence, which is different from existing unsupervised methods that rely on 3-D skeleton information. Our proposition is built on the state-of-the-art 3-D action descriptor 3-D dynamic voxel (3DV) with contrastive learning (CL). The 3DV can compress the point cloud sequence into a compact point cloud of 3-D motion information. Spatiotemporal data augmentations are conducted on it to drive CL. However, we find that existing CL methods (e.g., SimCLR or MoCo v2) often suffer from high pattern variance toward the augmented 3DV samples from the same action instance, that is, the augmented 3DV samples are still of high feature complementarity after CL, while the complementary discriminative clues within them have not been well exploited yet. To address this, a feature augmentation adapted CL (FACL) approach is proposed, which facilitates 3-D action representation via concerning the features from all augmented 3DV samples jointly, in spirit of feature augmentation. FACL runs in a global-local way: one branch learns global feature that involves the discriminative clues from the raw and augmented 3DV samples, and the other focuses on enhancing the discriminative power of local feature learned from each augmented 3DV sample. The global and local features are fused to characterize 3-D action jointly via concatenation. To fit FACL, a series of spatiotemporal data augmentation approaches is also studied on 3DV. Wide-range experiments verify the superiority of our unsupervised learning method for 3-D action feature learning. It outperforms the state-of-the-art skeleton-based counterparts by 6.4% and 3.6% with the cross-setup and cross-subject test settings on NTU RGB+D 120, respectively. The source code is available at https://github.com/tangent-T/FACL .</description><subject>Contrastive learning</subject><subject>Contrastive learning (CL)</subject><subject>Data augmentation</subject><subject>feature augmentation</subject><subject>Point cloud compression</subject><subject>point cloud sequence</subject><subject>Representation learning</subject><subject>Skeleton</subject><subject>Spatiotemporal phenomena</subject><subject>Three-dimensional displays</subject><subject>Training</subject><subject>unsupervised 3-D action representation learning</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtOwzAQRS0Eoqj0BxBCXrJJ8SO2E3ZQnlIFFe9d5MQTMEqdYCdI_XtSWipmMzPSvXdGB6EDSsaUkvTk6e5u-jhmhPEx55RJxbfQHqOSRYwnyfZmVm8DNArhk_QliZBxuosGXCmWCin2UH4Oi9oZPNNtC97hF-2tdgWc4mcXugb8tw1gMI8u8FnR2trhB2g8BHCt_l2noL2z7h2_2vYDz2rrWjyp6s7gR_jqoI_aRzulrgKM1n2Inq8unyY30fT--nZyNo0KpkQbJQUv-59lTpigQumUJ4ZIpmOiFcmLNFW50TkIHvOiVLQ0BhKgeW5ik5RKl3yIjle5ja_7y6HN5jYUUFXaQd2FjCVSUcGSmPdStpIWvg7BQ5k13s61X2SUZEu82S_ebIk3W-PtTUfr_C6fg9lY_mD2gsOVwALAv0QmpFSC_wCWyn-Q</recordid><startdate>20241201</startdate><enddate>20241201</enddate><creator>Tan, Bo</creator><creator>Xiao, Yang</creator><creator>Wang, Yancheng</creator><creator>Li, Shuai</creator><creator>Yang, Jianyu</creator><creator>Cao, Zhiguo</creator><creator>Zhou, Joey Tianyi</creator><creator>Yuan, Junsong</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-7739-4146</orcidid><orcidid>https://orcid.org/0000-0002-7324-7034</orcidid><orcidid>https://orcid.org/0000-0002-0208-221X</orcidid><orcidid>https://orcid.org/0009-0000-4633-6026</orcidid><orcidid>https://orcid.org/0000-0002-4675-7055</orcidid><orcidid>https://orcid.org/0009-0008-1009-2195</orcidid><orcidid>https://orcid.org/0000-0002-9223-1863</orcidid></search><sort><creationdate>20241201</creationdate><title>Beyond Pattern Variance: Unsupervised 3-D Action Representation Learning With Point Cloud Sequence</title><author>Tan, Bo ; Xiao, Yang ; Wang, Yancheng ; Li, Shuai ; Yang, Jianyu ; Cao, Zhiguo ; Zhou, Joey Tianyi ; Yuan, Junsong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c275t-8c3f2166b025157a938d062a40a70bc997bdabe5343cf71fdde8e1bbd4d8f7af3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Contrastive learning</topic><topic>Contrastive learning (CL)</topic><topic>Data augmentation</topic><topic>feature augmentation</topic><topic>Point cloud compression</topic><topic>point cloud sequence</topic><topic>Representation learning</topic><topic>Skeleton</topic><topic>Spatiotemporal phenomena</topic><topic>Three-dimensional displays</topic><topic>Training</topic><topic>unsupervised 3-D action representation learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Tan, Bo</creatorcontrib><creatorcontrib>Xiao, Yang</creatorcontrib><creatorcontrib>Wang, Yancheng</creatorcontrib><creatorcontrib>Li, Shuai</creatorcontrib><creatorcontrib>Yang, Jianyu</creatorcontrib><creatorcontrib>Cao, Zhiguo</creatorcontrib><creatorcontrib>Zhou, Joey Tianyi</creatorcontrib><creatorcontrib>Yuan, Junsong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tan, Bo</au><au>Xiao, Yang</au><au>Wang, Yancheng</au><au>Li, Shuai</au><au>Yang, Jianyu</au><au>Cao, Zhiguo</au><au>Zhou, Joey Tianyi</au><au>Yuan, Junsong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Beyond Pattern Variance: Unsupervised 3-D Action Representation Learning With Point Cloud Sequence</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2024-12-01</date><risdate>2024</risdate><volume>35</volume><issue>12</issue><spage>18186</spage><epage>18199</epage><pages>18186-18199</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>This work pays the first research effort to address unsupervised 3-D action representation learning with point cloud sequence, which is different from existing unsupervised methods that rely on 3-D skeleton information. Our proposition is built on the state-of-the-art 3-D action descriptor 3-D dynamic voxel (3DV) with contrastive learning (CL). The 3DV can compress the point cloud sequence into a compact point cloud of 3-D motion information. Spatiotemporal data augmentations are conducted on it to drive CL. However, we find that existing CL methods (e.g., SimCLR or MoCo v2) often suffer from high pattern variance toward the augmented 3DV samples from the same action instance, that is, the augmented 3DV samples are still of high feature complementarity after CL, while the complementary discriminative clues within them have not been well exploited yet. To address this, a feature augmentation adapted CL (FACL) approach is proposed, which facilitates 3-D action representation via concerning the features from all augmented 3DV samples jointly, in spirit of feature augmentation. FACL runs in a global-local way: one branch learns global feature that involves the discriminative clues from the raw and augmented 3DV samples, and the other focuses on enhancing the discriminative power of local feature learned from each augmented 3DV sample. The global and local features are fused to characterize 3-D action jointly via concatenation. To fit FACL, a series of spatiotemporal data augmentation approaches is also studied on 3DV. Wide-range experiments verify the superiority of our unsupervised learning method for 3-D action feature learning. It outperforms the state-of-the-art skeleton-based counterparts by 6.4% and 3.6% with the cross-setup and cross-subject test settings on NTU RGB+D 120, respectively. The source code is available at https://github.com/tangent-T/FACL .</abstract><cop>United States</cop><pub>IEEE</pub><pmid>37729565</pmid><doi>10.1109/TNNLS.2023.3312673</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-7739-4146</orcidid><orcidid>https://orcid.org/0000-0002-7324-7034</orcidid><orcidid>https://orcid.org/0000-0002-0208-221X</orcidid><orcidid>https://orcid.org/0009-0000-4633-6026</orcidid><orcidid>https://orcid.org/0000-0002-4675-7055</orcidid><orcidid>https://orcid.org/0009-0008-1009-2195</orcidid><orcidid>https://orcid.org/0000-0002-9223-1863</orcidid></addata></record>
fulltext fulltext_linktorsrc
identifier ISSN: 2162-237X
ispartof IEEE transaction on neural networks and learning systems, 2024-12, Vol.35 (12), p.18186-18199
issn 2162-237X
2162-2388
language eng
recordid cdi_crossref_primary_10_1109_TNNLS_2023_3312673
source IEEE Electronic Library (IEL)
subjects Contrastive learning
Contrastive learning (CL)
Data augmentation
feature augmentation
Point cloud compression
point cloud sequence
Representation learning
Skeleton
Spatiotemporal phenomena
Three-dimensional displays
Training
unsupervised 3-D action representation learning
title Beyond Pattern Variance: Unsupervised 3-D Action Representation Learning With Point Cloud Sequence
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T15%3A16%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Beyond%20Pattern%20Variance:%20Unsupervised%203-D%20Action%20Representation%20Learning%20With%20Point%20Cloud%20Sequence&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Tan,%20Bo&rft.date=2024-12-01&rft.volume=35&rft.issue=12&rft.spage=18186&rft.epage=18199&rft.pages=18186-18199&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2023.3312673&rft_dat=%3Cproquest_RIE%3E2867152843%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2867152843&rft_id=info:pmid/37729565&rft_ieee_id=10256675&rfr_iscdi=true