SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals
This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled trai...
Gespeichert in:
Veröffentlicht in: | Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies mobile, wearable and ubiquitous technologies, 2024-11, Vol.8 (4), p.1-30, Article 198 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
container_end_page | 30 |
---|---|
container_issue | 4 |
container_start_page | 1 |
container_title | Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies |
container_volume | 8 |
creator | Chen, Yatong Hu, Chenzhi Kimura, Tomoyoshi Li, Qinya Liu, Shengzhong Wu, Fan Chen, Guihai |
description | This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key design constitutes of three aspects: First, we factorize the latent embedding of each modality into shared and private components and perform knowledge transfer considering both the modality information commonality and gaps. Second, we enforce structural correlation constraints between the source modality and the target modality, to push the target modal embedding space symmetric to the source modal embedding space, with the anchoring of additional source-modal samples, which expands the existing modal-matching objective in current multi-modal contrastive frameworks. Finally, we conduct downstream task finetuning in the spherical space with a KNN classifier to better align with the structured modality embedding space. Extensive evaluations on five multimodal IoT datasets are performed to validate the effectiveness of SemiCMT in cross-modal knowledge transfer, including a new self-collected dataset using seismic and acoustic signals for office activity monitoring. SemiCMT consistently outperforms existing self-supervised and knowledge transfer approaches by up to 36.47% in the finetuned target-modal classification tasks. The code and the self-collected dataset will be released at https://github.com/SJTU-RTEAS/SemiCMT. |
doi_str_mv | 10.1145/3699779 |
format | Article |
fullrecord | <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3699779</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3699779</sourcerecordid><originalsourceid>FETCH-LOGICAL-a519-d84ae1ad5b8a160f0f8eb2ae4830a6b9ff32f263e1a7e6806753f287b741881e3</originalsourceid><addsrcrecordid>eNpNkM1Lw0AUxBdRsNTi3dPePEV3s8l-eJPgR7FBIbmHl-Ztu5JuZDda_O9taRVP82B-b2CGkEvObjjP8lshjVHKnJBJmqksMblUp__uczKL8Z0xxo0QmqkJWVe4cUVZ39Fi8GOAOLovpEUYYkzKoYOevvhh22O3QloH8NFioHYIdD7UtEIfnV_RrRvXdB-UvIEL2NHysx_d8b9yKw99vCBndic4O-qU1I8PdfGcLF6f5sX9IoGcm6TTGSCHLm81cMkssxrbFDDTgoFsjbUitakUO0ah1EyqXNhUq1ZlXGuOYkquD7HLfYWAtvkIbgPhu-Gs2U_UHCfakVcHEpabP-jX_AEmpmCD</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals</title><source>ACM Digital Library Complete</source><creator>Chen, Yatong ; Hu, Chenzhi ; Kimura, Tomoyoshi ; Li, Qinya ; Liu, Shengzhong ; Wu, Fan ; Chen, Guihai</creator><creatorcontrib>Chen, Yatong ; Hu, Chenzhi ; Kimura, Tomoyoshi ; Li, Qinya ; Liu, Shengzhong ; Wu, Fan ; Chen, Guihai</creatorcontrib><description>This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key design constitutes of three aspects: First, we factorize the latent embedding of each modality into shared and private components and perform knowledge transfer considering both the modality information commonality and gaps. Second, we enforce structural correlation constraints between the source modality and the target modality, to push the target modal embedding space symmetric to the source modal embedding space, with the anchoring of additional source-modal samples, which expands the existing modal-matching objective in current multi-modal contrastive frameworks. Finally, we conduct downstream task finetuning in the spherical space with a KNN classifier to better align with the structured modality embedding space. Extensive evaluations on five multimodal IoT datasets are performed to validate the effectiveness of SemiCMT in cross-modal knowledge transfer, including a new self-collected dataset using seismic and acoustic signals for office activity monitoring. SemiCMT consistently outperforms existing self-supervised and knowledge transfer approaches by up to 36.47% in the finetuned target-modal classification tasks. The code and the self-collected dataset will be released at https://github.com/SJTU-RTEAS/SemiCMT.</description><identifier>ISSN: 2474-9567</identifier><identifier>EISSN: 2474-9567</identifier><identifier>DOI: 10.1145/3699779</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computing methodologies ; Human-centered computing ; Machine learning ; Ubiquitous and mobile computing ; Ubiquitous and mobile computing theory, concepts and paradigms</subject><ispartof>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies, 2024-11, Vol.8 (4), p.1-30, Article 198</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a519-d84ae1ad5b8a160f0f8eb2ae4830a6b9ff32f263e1a7e6806753f287b741881e3</cites><orcidid>0009-0008-4297-5865 ; 0000-0002-7643-7239 ; 0000-0002-4881-8376 ; 0000-0002-8954-9109 ; 0000-0002-6934-1685 ; 0000-0003-0965-9058 ; 0009-0007-0713-4393</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3699779$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2280,27922,27923,40194,75998</link.rule.ids></links><search><creatorcontrib>Chen, Yatong</creatorcontrib><creatorcontrib>Hu, Chenzhi</creatorcontrib><creatorcontrib>Kimura, Tomoyoshi</creatorcontrib><creatorcontrib>Li, Qinya</creatorcontrib><creatorcontrib>Liu, Shengzhong</creatorcontrib><creatorcontrib>Wu, Fan</creatorcontrib><creatorcontrib>Chen, Guihai</creatorcontrib><title>SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals</title><title>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</title><addtitle>ACM IMWUT</addtitle><description>This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key design constitutes of three aspects: First, we factorize the latent embedding of each modality into shared and private components and perform knowledge transfer considering both the modality information commonality and gaps. Second, we enforce structural correlation constraints between the source modality and the target modality, to push the target modal embedding space symmetric to the source modal embedding space, with the anchoring of additional source-modal samples, which expands the existing modal-matching objective in current multi-modal contrastive frameworks. Finally, we conduct downstream task finetuning in the spherical space with a KNN classifier to better align with the structured modality embedding space. Extensive evaluations on five multimodal IoT datasets are performed to validate the effectiveness of SemiCMT in cross-modal knowledge transfer, including a new self-collected dataset using seismic and acoustic signals for office activity monitoring. SemiCMT consistently outperforms existing self-supervised and knowledge transfer approaches by up to 36.47% in the finetuned target-modal classification tasks. The code and the self-collected dataset will be released at https://github.com/SJTU-RTEAS/SemiCMT.</description><subject>Computing methodologies</subject><subject>Human-centered computing</subject><subject>Machine learning</subject><subject>Ubiquitous and mobile computing</subject><subject>Ubiquitous and mobile computing theory, concepts and paradigms</subject><issn>2474-9567</issn><issn>2474-9567</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkM1Lw0AUxBdRsNTi3dPePEV3s8l-eJPgR7FBIbmHl-Ztu5JuZDda_O9taRVP82B-b2CGkEvObjjP8lshjVHKnJBJmqksMblUp__uczKL8Z0xxo0QmqkJWVe4cUVZ39Fi8GOAOLovpEUYYkzKoYOevvhh22O3QloH8NFioHYIdD7UtEIfnV_RrRvXdB-UvIEL2NHysx_d8b9yKw99vCBndic4O-qU1I8PdfGcLF6f5sX9IoGcm6TTGSCHLm81cMkssxrbFDDTgoFsjbUitakUO0ah1EyqXNhUq1ZlXGuOYkquD7HLfYWAtvkIbgPhu-Gs2U_UHCfakVcHEpabP-jX_AEmpmCD</recordid><startdate>20241121</startdate><enddate>20241121</enddate><creator>Chen, Yatong</creator><creator>Hu, Chenzhi</creator><creator>Kimura, Tomoyoshi</creator><creator>Li, Qinya</creator><creator>Liu, Shengzhong</creator><creator>Wu, Fan</creator><creator>Chen, Guihai</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0009-0008-4297-5865</orcidid><orcidid>https://orcid.org/0000-0002-7643-7239</orcidid><orcidid>https://orcid.org/0000-0002-4881-8376</orcidid><orcidid>https://orcid.org/0000-0002-8954-9109</orcidid><orcidid>https://orcid.org/0000-0002-6934-1685</orcidid><orcidid>https://orcid.org/0000-0003-0965-9058</orcidid><orcidid>https://orcid.org/0009-0007-0713-4393</orcidid></search><sort><creationdate>20241121</creationdate><title>SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals</title><author>Chen, Yatong ; Hu, Chenzhi ; Kimura, Tomoyoshi ; Li, Qinya ; Liu, Shengzhong ; Wu, Fan ; Chen, Guihai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a519-d84ae1ad5b8a160f0f8eb2ae4830a6b9ff32f263e1a7e6806753f287b741881e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computing methodologies</topic><topic>Human-centered computing</topic><topic>Machine learning</topic><topic>Ubiquitous and mobile computing</topic><topic>Ubiquitous and mobile computing theory, concepts and paradigms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Yatong</creatorcontrib><creatorcontrib>Hu, Chenzhi</creatorcontrib><creatorcontrib>Kimura, Tomoyoshi</creatorcontrib><creatorcontrib>Li, Qinya</creatorcontrib><creatorcontrib>Liu, Shengzhong</creatorcontrib><creatorcontrib>Wu, Fan</creatorcontrib><creatorcontrib>Chen, Guihai</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Yatong</au><au>Hu, Chenzhi</au><au>Kimura, Tomoyoshi</au><au>Li, Qinya</au><au>Liu, Shengzhong</au><au>Wu, Fan</au><au>Chen, Guihai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals</atitle><jtitle>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</jtitle><stitle>ACM IMWUT</stitle><date>2024-11-21</date><risdate>2024</risdate><volume>8</volume><issue>4</issue><spage>1</spage><epage>30</epage><pages>1-30</pages><artnum>198</artnum><issn>2474-9567</issn><eissn>2474-9567</eissn><abstract>This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key design constitutes of three aspects: First, we factorize the latent embedding of each modality into shared and private components and perform knowledge transfer considering both the modality information commonality and gaps. Second, we enforce structural correlation constraints between the source modality and the target modality, to push the target modal embedding space symmetric to the source modal embedding space, with the anchoring of additional source-modal samples, which expands the existing modal-matching objective in current multi-modal contrastive frameworks. Finally, we conduct downstream task finetuning in the spherical space with a KNN classifier to better align with the structured modality embedding space. Extensive evaluations on five multimodal IoT datasets are performed to validate the effectiveness of SemiCMT in cross-modal knowledge transfer, including a new self-collected dataset using seismic and acoustic signals for office activity monitoring. SemiCMT consistently outperforms existing self-supervised and knowledge transfer approaches by up to 36.47% in the finetuned target-modal classification tasks. The code and the self-collected dataset will be released at https://github.com/SJTU-RTEAS/SemiCMT.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3699779</doi><tpages>30</tpages><orcidid>https://orcid.org/0009-0008-4297-5865</orcidid><orcidid>https://orcid.org/0000-0002-7643-7239</orcidid><orcidid>https://orcid.org/0000-0002-4881-8376</orcidid><orcidid>https://orcid.org/0000-0002-8954-9109</orcidid><orcidid>https://orcid.org/0000-0002-6934-1685</orcidid><orcidid>https://orcid.org/0000-0003-0965-9058</orcidid><orcidid>https://orcid.org/0009-0007-0713-4393</orcidid></addata></record> |
fulltext | fulltext |
identifier | ISSN: 2474-9567 |
ispartof | Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies, 2024-11, Vol.8 (4), p.1-30, Article 198 |
issn | 2474-9567 2474-9567 |
language | eng |
recordid | cdi_crossref_primary_10_1145_3699779 |
source | ACM Digital Library Complete |
subjects | Computing methodologies Human-centered computing Machine learning Ubiquitous and mobile computing Ubiquitous and mobile computing theory, concepts and paradigms |
title | SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals |
url | https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T02%3A41%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SemiCMT:%20Contrastive%20Cross-Modal%20Knowledge%20Transfer%20for%20IoT%20Sensing%20with%20Semi-Paired%20Multi-Modal%20Signals&rft.jtitle=Proceedings%20of%20ACM%20on%20interactive,%20mobile,%20wearable%20and%20ubiquitous%20technologies&rft.au=Chen,%20Yatong&rft.date=2024-11-21&rft.volume=8&rft.issue=4&rft.spage=1&rft.epage=30&rft.pages=1-30&rft.artnum=198&rft.issn=2474-9567&rft.eissn=2474-9567&rft_id=info:doi/10.1145/3699779&rft_dat=%3Cacm_cross%3E3699779%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true |