Beyond Pattern Variance: Unsupervised 3-D Action Representation Learning With Point Cloud Sequence
This work pays the first research effort to address unsupervised 3-D action representation learning with point cloud sequence, which is different from existing unsupervised methods that rely on 3-D skeleton information. Our proposition is built on the state-of-the-art 3-D action descriptor 3-D dynam...
Gespeichert in:
Veröffentlicht in: | IEEE transaction on neural networks and learning systems 2024-12, Vol.35 (12), p.18186-18199 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 18199 |
---|---|
container_issue | 12 |
container_start_page | 18186 |
container_title | IEEE transaction on neural networks and learning systems |
container_volume | 35 |
creator | Tan, Bo Xiao, Yang Wang, Yancheng Li, Shuai Yang, Jianyu Cao, Zhiguo Zhou, Joey Tianyi Yuan, Junsong |
description | This work pays the first research effort to address unsupervised 3-D action representation learning with point cloud sequence, which is different from existing unsupervised methods that rely on 3-D skeleton information. Our proposition is built on the state-of-the-art 3-D action descriptor 3-D dynamic voxel (3DV) with contrastive learning (CL). The 3DV can compress the point cloud sequence into a compact point cloud of 3-D motion information. Spatiotemporal data augmentations are conducted on it to drive CL. However, we find that existing CL methods (e.g., SimCLR or MoCo v2) often suffer from high pattern variance toward the augmented 3DV samples from the same action instance, that is, the augmented 3DV samples are still of high feature complementarity after CL, while the complementary discriminative clues within them have not been well exploited yet. To address this, a feature augmentation adapted CL (FACL) approach is proposed, which facilitates 3-D action representation via concerning the features from all augmented 3DV samples jointly, in spirit of feature augmentation. FACL runs in a global-local way: one branch learns global feature that involves the discriminative clues from the raw and augmented 3DV samples, and the other focuses on enhancing the discriminative power of local feature learned from each augmented 3DV sample. The global and local features are fused to characterize 3-D action jointly via concatenation. To fit FACL, a series of spatiotemporal data augmentation approaches is also studied on 3DV. Wide-range experiments verify the superiority of our unsupervised learning method for 3-D action feature learning. It outperforms the state-of-the-art skeleton-based counterparts by 6.4% and 3.6% with the cross-setup and cross-subject test settings on NTU RGB+D 120, respectively. The source code is available at https://github.com/tangent-T/FACL . |
doi_str_mv | 10.1109/TNNLS.2023.3312673 |
format | Article |
fullrecord | <record><control><sourceid>proquest_RIE</sourceid><recordid>TN_cdi_crossref_primary_10_1109_TNNLS_2023_3312673</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><ieee_id>10256675</ieee_id><sourcerecordid>2867152843</sourcerecordid><originalsourceid>FETCH-LOGICAL-c275t-8c3f2166b025157a938d062a40a70bc997bdabe5343cf71fdde8e1bbd4d8f7af3</originalsourceid><addsrcrecordid>eNpNkMtOwzAQRS0Eoqj0BxBCXrJJ8SO2E3ZQnlIFFe9d5MQTMEqdYCdI_XtSWipmMzPSvXdGB6EDSsaUkvTk6e5u-jhmhPEx55RJxbfQHqOSRYwnyfZmVm8DNArhk_QliZBxuosGXCmWCin2UH4Oi9oZPNNtC97hF-2tdgWc4mcXugb8tw1gMI8u8FnR2trhB2g8BHCt_l2noL2z7h2_2vYDz2rrWjyp6s7gR_jqoI_aRzulrgKM1n2Inq8unyY30fT--nZyNo0KpkQbJQUv-59lTpigQumUJ4ZIpmOiFcmLNFW50TkIHvOiVLQ0BhKgeW5ik5RKl3yIjle5ja_7y6HN5jYUUFXaQd2FjCVSUcGSmPdStpIWvg7BQ5k13s61X2SUZEu82S_ebIk3W-PtTUfr_C6fg9lY_mD2gsOVwALAv0QmpFSC_wCWyn-Q</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype><pqid>2867152843</pqid></control><display><type>article</type><title>Beyond Pattern Variance: Unsupervised 3-D Action Representation Learning With Point Cloud Sequence</title><source>IEEE Electronic Library (IEL)</source><creator>Tan, Bo ; Xiao, Yang ; Wang, Yancheng ; Li, Shuai ; Yang, Jianyu ; Cao, Zhiguo ; Zhou, Joey Tianyi ; Yuan, Junsong</creator><creatorcontrib>Tan, Bo ; Xiao, Yang ; Wang, Yancheng ; Li, Shuai ; Yang, Jianyu ; Cao, Zhiguo ; Zhou, Joey Tianyi ; Yuan, Junsong</creatorcontrib><description>This work pays the first research effort to address unsupervised 3-D action representation learning with point cloud sequence, which is different from existing unsupervised methods that rely on 3-D skeleton information. Our proposition is built on the state-of-the-art 3-D action descriptor 3-D dynamic voxel (3DV) with contrastive learning (CL). The 3DV can compress the point cloud sequence into a compact point cloud of 3-D motion information. Spatiotemporal data augmentations are conducted on it to drive CL. However, we find that existing CL methods (e.g., SimCLR or MoCo v2) often suffer from high pattern variance toward the augmented 3DV samples from the same action instance, that is, the augmented 3DV samples are still of high feature complementarity after CL, while the complementary discriminative clues within them have not been well exploited yet. To address this, a feature augmentation adapted CL (FACL) approach is proposed, which facilitates 3-D action representation via concerning the features from all augmented 3DV samples jointly, in spirit of feature augmentation. FACL runs in a global-local way: one branch learns global feature that involves the discriminative clues from the raw and augmented 3DV samples, and the other focuses on enhancing the discriminative power of local feature learned from each augmented 3DV sample. The global and local features are fused to characterize 3-D action jointly via concatenation. To fit FACL, a series of spatiotemporal data augmentation approaches is also studied on 3DV. Wide-range experiments verify the superiority of our unsupervised learning method for 3-D action feature learning. It outperforms the state-of-the-art skeleton-based counterparts by 6.4% and 3.6% with the cross-setup and cross-subject test settings on NTU RGB+D 120, respectively. The source code is available at https://github.com/tangent-T/FACL .</description><identifier>ISSN: 2162-237X</identifier><identifier>EISSN: 2162-2388</identifier><identifier>DOI: 10.1109/TNNLS.2023.3312673</identifier><identifier>PMID: 37729565</identifier><identifier>CODEN: ITNNAL</identifier><language>eng</language><publisher>United States: IEEE</publisher><subject>Contrastive learning ; Contrastive learning (CL) ; Data augmentation ; feature augmentation ; Point cloud compression ; point cloud sequence ; Representation learning ; Skeleton ; Spatiotemporal phenomena ; Three-dimensional displays ; Training ; unsupervised 3-D action representation learning</subject><ispartof>IEEE transaction on neural networks and learning systems, 2024-12, Vol.35 (12), p.18186-18199</ispartof><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-c275t-8c3f2166b025157a938d062a40a70bc997bdabe5343cf71fdde8e1bbd4d8f7af3</cites><orcidid>0000-0002-7739-4146 ; 0000-0002-7324-7034 ; 0000-0002-0208-221X ; 0009-0000-4633-6026 ; 0000-0002-4675-7055 ; 0009-0008-1009-2195 ; 0000-0002-9223-1863</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktohtml>$$Uhttps://ieeexplore.ieee.org/document/10256675$$EHTML$$P50$$Gieee$$H</linktohtml><link.rule.ids>314,776,780,792,27901,27902,54733</link.rule.ids><linktorsrc>$$Uhttps://ieeexplore.ieee.org/document/10256675$$EView_record_in_IEEE$$FView_record_in_$$GIEEE</linktorsrc><backlink>$$Uhttps://www.ncbi.nlm.nih.gov/pubmed/37729565$$D View this record in MEDLINE/PubMed$$Hfree_for_read</backlink></links><search><creatorcontrib>Tan, Bo</creatorcontrib><creatorcontrib>Xiao, Yang</creatorcontrib><creatorcontrib>Wang, Yancheng</creatorcontrib><creatorcontrib>Li, Shuai</creatorcontrib><creatorcontrib>Yang, Jianyu</creatorcontrib><creatorcontrib>Cao, Zhiguo</creatorcontrib><creatorcontrib>Zhou, Joey Tianyi</creatorcontrib><creatorcontrib>Yuan, Junsong</creatorcontrib><title>Beyond Pattern Variance: Unsupervised 3-D Action Representation Learning With Point Cloud Sequence</title><title>IEEE transaction on neural networks and learning systems</title><addtitle>TNNLS</addtitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><description>This work pays the first research effort to address unsupervised 3-D action representation learning with point cloud sequence, which is different from existing unsupervised methods that rely on 3-D skeleton information. Our proposition is built on the state-of-the-art 3-D action descriptor 3-D dynamic voxel (3DV) with contrastive learning (CL). The 3DV can compress the point cloud sequence into a compact point cloud of 3-D motion information. Spatiotemporal data augmentations are conducted on it to drive CL. However, we find that existing CL methods (e.g., SimCLR or MoCo v2) often suffer from high pattern variance toward the augmented 3DV samples from the same action instance, that is, the augmented 3DV samples are still of high feature complementarity after CL, while the complementary discriminative clues within them have not been well exploited yet. To address this, a feature augmentation adapted CL (FACL) approach is proposed, which facilitates 3-D action representation via concerning the features from all augmented 3DV samples jointly, in spirit of feature augmentation. FACL runs in a global-local way: one branch learns global feature that involves the discriminative clues from the raw and augmented 3DV samples, and the other focuses on enhancing the discriminative power of local feature learned from each augmented 3DV sample. The global and local features are fused to characterize 3-D action jointly via concatenation. To fit FACL, a series of spatiotemporal data augmentation approaches is also studied on 3DV. Wide-range experiments verify the superiority of our unsupervised learning method for 3-D action feature learning. It outperforms the state-of-the-art skeleton-based counterparts by 6.4% and 3.6% with the cross-setup and cross-subject test settings on NTU RGB+D 120, respectively. The source code is available at https://github.com/tangent-T/FACL .</description><subject>Contrastive learning</subject><subject>Contrastive learning (CL)</subject><subject>Data augmentation</subject><subject>feature augmentation</subject><subject>Point cloud compression</subject><subject>point cloud sequence</subject><subject>Representation learning</subject><subject>Skeleton</subject><subject>Spatiotemporal phenomena</subject><subject>Three-dimensional displays</subject><subject>Training</subject><subject>unsupervised 3-D action representation learning</subject><issn>2162-237X</issn><issn>2162-2388</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><sourceid>RIE</sourceid><recordid>eNpNkMtOwzAQRS0Eoqj0BxBCXrJJ8SO2E3ZQnlIFFe9d5MQTMEqdYCdI_XtSWipmMzPSvXdGB6EDSsaUkvTk6e5u-jhmhPEx55RJxbfQHqOSRYwnyfZmVm8DNArhk_QliZBxuosGXCmWCin2UH4Oi9oZPNNtC97hF-2tdgWc4mcXugb8tw1gMI8u8FnR2trhB2g8BHCt_l2noL2z7h2_2vYDz2rrWjyp6s7gR_jqoI_aRzulrgKM1n2Inq8unyY30fT--nZyNo0KpkQbJQUv-59lTpigQumUJ4ZIpmOiFcmLNFW50TkIHvOiVLQ0BhKgeW5ik5RKl3yIjle5ja_7y6HN5jYUUFXaQd2FjCVSUcGSmPdStpIWvg7BQ5k13s61X2SUZEu82S_ebIk3W-PtTUfr_C6fg9lY_mD2gsOVwALAv0QmpFSC_wCWyn-Q</recordid><startdate>20241201</startdate><enddate>20241201</enddate><creator>Tan, Bo</creator><creator>Xiao, Yang</creator><creator>Wang, Yancheng</creator><creator>Li, Shuai</creator><creator>Yang, Jianyu</creator><creator>Cao, Zhiguo</creator><creator>Zhou, Joey Tianyi</creator><creator>Yuan, Junsong</creator><general>IEEE</general><scope>97E</scope><scope>RIA</scope><scope>RIE</scope><scope>NPM</scope><scope>AAYXX</scope><scope>CITATION</scope><scope>7X8</scope><orcidid>https://orcid.org/0000-0002-7739-4146</orcidid><orcidid>https://orcid.org/0000-0002-7324-7034</orcidid><orcidid>https://orcid.org/0000-0002-0208-221X</orcidid><orcidid>https://orcid.org/0009-0000-4633-6026</orcidid><orcidid>https://orcid.org/0000-0002-4675-7055</orcidid><orcidid>https://orcid.org/0009-0008-1009-2195</orcidid><orcidid>https://orcid.org/0000-0002-9223-1863</orcidid></search><sort><creationdate>20241201</creationdate><title>Beyond Pattern Variance: Unsupervised 3-D Action Representation Learning With Point Cloud Sequence</title><author>Tan, Bo ; Xiao, Yang ; Wang, Yancheng ; Li, Shuai ; Yang, Jianyu ; Cao, Zhiguo ; Zhou, Joey Tianyi ; Yuan, Junsong</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-c275t-8c3f2166b025157a938d062a40a70bc997bdabe5343cf71fdde8e1bbd4d8f7af3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Contrastive learning</topic><topic>Contrastive learning (CL)</topic><topic>Data augmentation</topic><topic>feature augmentation</topic><topic>Point cloud compression</topic><topic>point cloud sequence</topic><topic>Representation learning</topic><topic>Skeleton</topic><topic>Spatiotemporal phenomena</topic><topic>Three-dimensional displays</topic><topic>Training</topic><topic>unsupervised 3-D action representation learning</topic><toplevel>online_resources</toplevel><creatorcontrib>Tan, Bo</creatorcontrib><creatorcontrib>Xiao, Yang</creatorcontrib><creatorcontrib>Wang, Yancheng</creatorcontrib><creatorcontrib>Li, Shuai</creatorcontrib><creatorcontrib>Yang, Jianyu</creatorcontrib><creatorcontrib>Cao, Zhiguo</creatorcontrib><creatorcontrib>Zhou, Joey Tianyi</creatorcontrib><creatorcontrib>Yuan, Junsong</creatorcontrib><collection>IEEE All-Society Periodicals Package (ASPP) 2005-present</collection><collection>IEEE All-Society Periodicals Package (ASPP) 1998-Present</collection><collection>IEEE Electronic Library (IEL)</collection><collection>PubMed</collection><collection>CrossRef</collection><collection>MEDLINE - Academic</collection><jtitle>IEEE transaction on neural networks and learning systems</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext_linktorsrc</fulltext></delivery><addata><au>Tan, Bo</au><au>Xiao, Yang</au><au>Wang, Yancheng</au><au>Li, Shuai</au><au>Yang, Jianyu</au><au>Cao, Zhiguo</au><au>Zhou, Joey Tianyi</au><au>Yuan, Junsong</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>Beyond Pattern Variance: Unsupervised 3-D Action Representation Learning With Point Cloud Sequence</atitle><jtitle>IEEE transaction on neural networks and learning systems</jtitle><stitle>TNNLS</stitle><addtitle>IEEE Trans Neural Netw Learn Syst</addtitle><date>2024-12-01</date><risdate>2024</risdate><volume>35</volume><issue>12</issue><spage>18186</spage><epage>18199</epage><pages>18186-18199</pages><issn>2162-237X</issn><eissn>2162-2388</eissn><coden>ITNNAL</coden><abstract>This work pays the first research effort to address unsupervised 3-D action representation learning with point cloud sequence, which is different from existing unsupervised methods that rely on 3-D skeleton information. Our proposition is built on the state-of-the-art 3-D action descriptor 3-D dynamic voxel (3DV) with contrastive learning (CL). The 3DV can compress the point cloud sequence into a compact point cloud of 3-D motion information. Spatiotemporal data augmentations are conducted on it to drive CL. However, we find that existing CL methods (e.g., SimCLR or MoCo v2) often suffer from high pattern variance toward the augmented 3DV samples from the same action instance, that is, the augmented 3DV samples are still of high feature complementarity after CL, while the complementary discriminative clues within them have not been well exploited yet. To address this, a feature augmentation adapted CL (FACL) approach is proposed, which facilitates 3-D action representation via concerning the features from all augmented 3DV samples jointly, in spirit of feature augmentation. FACL runs in a global-local way: one branch learns global feature that involves the discriminative clues from the raw and augmented 3DV samples, and the other focuses on enhancing the discriminative power of local feature learned from each augmented 3DV sample. The global and local features are fused to characterize 3-D action jointly via concatenation. To fit FACL, a series of spatiotemporal data augmentation approaches is also studied on 3DV. Wide-range experiments verify the superiority of our unsupervised learning method for 3-D action feature learning. It outperforms the state-of-the-art skeleton-based counterparts by 6.4% and 3.6% with the cross-setup and cross-subject test settings on NTU RGB+D 120, respectively. The source code is available at https://github.com/tangent-T/FACL .</abstract><cop>United States</cop><pub>IEEE</pub><pmid>37729565</pmid><doi>10.1109/TNNLS.2023.3312673</doi><tpages>14</tpages><orcidid>https://orcid.org/0000-0002-7739-4146</orcidid><orcidid>https://orcid.org/0000-0002-7324-7034</orcidid><orcidid>https://orcid.org/0000-0002-0208-221X</orcidid><orcidid>https://orcid.org/0009-0000-4633-6026</orcidid><orcidid>https://orcid.org/0000-0002-4675-7055</orcidid><orcidid>https://orcid.org/0009-0008-1009-2195</orcidid><orcidid>https://orcid.org/0000-0002-9223-1863</orcidid></addata></record> |
fulltext | fulltext_linktorsrc |
identifier | ISSN: 2162-237X |
ispartof | IEEE transaction on neural networks and learning systems, 2024-12, Vol.35 (12), p.18186-18199 |
issn | 2162-237X 2162-2388 |
language | eng |
recordid | cdi_crossref_primary_10_1109_TNNLS_2023_3312673 |
source | IEEE Electronic Library (IEL) |
subjects | Contrastive learning Contrastive learning (CL) Data augmentation feature augmentation Point cloud compression point cloud sequence Representation learning Skeleton Spatiotemporal phenomena Three-dimensional displays Training unsupervised 3-D action representation learning |
title | Beyond Pattern Variance: Unsupervised 3-D Action Representation Learning With Point Cloud Sequence |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-02-05T15%3A16%3A17IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-proquest_RIE&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=Beyond%20Pattern%20Variance:%20Unsupervised%203-D%20Action%20Representation%20Learning%20With%20Point%20Cloud%20Sequence&rft.jtitle=IEEE%20transaction%20on%20neural%20networks%20and%20learning%20systems&rft.au=Tan,%20Bo&rft.date=2024-12-01&rft.volume=35&rft.issue=12&rft.spage=18186&rft.epage=18199&rft.pages=18186-18199&rft.issn=2162-237X&rft.eissn=2162-2388&rft.coden=ITNNAL&rft_id=info:doi/10.1109/TNNLS.2023.3312673&rft_dat=%3Cproquest_RIE%3E2867152843%3C/proquest_RIE%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_pqid=2867152843&rft_id=info:pmid/37729565&rft_ieee_id=10256675&rfr_iscdi=true |