SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals

This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled trai...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies mobile, wearable and ubiquitous technologies, 2024-11, Vol.8 (4), p.1-30, Article 198
Hauptverfasser: Chen, Yatong, Hu, Chenzhi, Kimura, Tomoyoshi, Li, Qinya, Liu, Shengzhong, Wu, Fan, Chen, Guihai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
container_end_page 30
container_issue 4
container_start_page 1
container_title Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies
container_volume 8
creator Chen, Yatong
Hu, Chenzhi
Kimura, Tomoyoshi
Li, Qinya
Liu, Shengzhong
Wu, Fan
Chen, Guihai
description This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key design constitutes of three aspects: First, we factorize the latent embedding of each modality into shared and private components and perform knowledge transfer considering both the modality information commonality and gaps. Second, we enforce structural correlation constraints between the source modality and the target modality, to push the target modal embedding space symmetric to the source modal embedding space, with the anchoring of additional source-modal samples, which expands the existing modal-matching objective in current multi-modal contrastive frameworks. Finally, we conduct downstream task finetuning in the spherical space with a KNN classifier to better align with the structured modality embedding space. Extensive evaluations on five multimodal IoT datasets are performed to validate the effectiveness of SemiCMT in cross-modal knowledge transfer, including a new self-collected dataset using seismic and acoustic signals for office activity monitoring. SemiCMT consistently outperforms existing self-supervised and knowledge transfer approaches by up to 36.47% in the finetuned target-modal classification tasks. The code and the self-collected dataset will be released at https://github.com/SJTU-RTEAS/SemiCMT.
doi_str_mv 10.1145/3699779
format Article
fullrecord <record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3699779</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3699779</sourcerecordid><originalsourceid>FETCH-LOGICAL-a519-d84ae1ad5b8a160f0f8eb2ae4830a6b9ff32f263e1a7e6806753f287b741881e3</originalsourceid><addsrcrecordid>eNpNkM1Lw0AUxBdRsNTi3dPePEV3s8l-eJPgR7FBIbmHl-Ztu5JuZDda_O9taRVP82B-b2CGkEvObjjP8lshjVHKnJBJmqksMblUp__uczKL8Z0xxo0QmqkJWVe4cUVZ39Fi8GOAOLovpEUYYkzKoYOevvhh22O3QloH8NFioHYIdD7UtEIfnV_RrRvXdB-UvIEL2NHysx_d8b9yKw99vCBndic4O-qU1I8PdfGcLF6f5sX9IoGcm6TTGSCHLm81cMkssxrbFDDTgoFsjbUitakUO0ah1EyqXNhUq1ZlXGuOYkquD7HLfYWAtvkIbgPhu-Gs2U_UHCfakVcHEpabP-jX_AEmpmCD</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals</title><source>ACM Digital Library Complete</source><creator>Chen, Yatong ; Hu, Chenzhi ; Kimura, Tomoyoshi ; Li, Qinya ; Liu, Shengzhong ; Wu, Fan ; Chen, Guihai</creator><creatorcontrib>Chen, Yatong ; Hu, Chenzhi ; Kimura, Tomoyoshi ; Li, Qinya ; Liu, Shengzhong ; Wu, Fan ; Chen, Guihai</creatorcontrib><description>This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key design constitutes of three aspects: First, we factorize the latent embedding of each modality into shared and private components and perform knowledge transfer considering both the modality information commonality and gaps. Second, we enforce structural correlation constraints between the source modality and the target modality, to push the target modal embedding space symmetric to the source modal embedding space, with the anchoring of additional source-modal samples, which expands the existing modal-matching objective in current multi-modal contrastive frameworks. Finally, we conduct downstream task finetuning in the spherical space with a KNN classifier to better align with the structured modality embedding space. Extensive evaluations on five multimodal IoT datasets are performed to validate the effectiveness of SemiCMT in cross-modal knowledge transfer, including a new self-collected dataset using seismic and acoustic signals for office activity monitoring. SemiCMT consistently outperforms existing self-supervised and knowledge transfer approaches by up to 36.47% in the finetuned target-modal classification tasks. The code and the self-collected dataset will be released at https://github.com/SJTU-RTEAS/SemiCMT.</description><identifier>ISSN: 2474-9567</identifier><identifier>EISSN: 2474-9567</identifier><identifier>DOI: 10.1145/3699779</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computing methodologies ; Human-centered computing ; Machine learning ; Ubiquitous and mobile computing ; Ubiquitous and mobile computing theory, concepts and paradigms</subject><ispartof>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies, 2024-11, Vol.8 (4), p.1-30, Article 198</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a519-d84ae1ad5b8a160f0f8eb2ae4830a6b9ff32f263e1a7e6806753f287b741881e3</cites><orcidid>0009-0008-4297-5865 ; 0000-0002-7643-7239 ; 0000-0002-4881-8376 ; 0000-0002-8954-9109 ; 0000-0002-6934-1685 ; 0000-0003-0965-9058 ; 0009-0007-0713-4393</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3699779$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2280,27922,27923,40194,75998</link.rule.ids></links><search><creatorcontrib>Chen, Yatong</creatorcontrib><creatorcontrib>Hu, Chenzhi</creatorcontrib><creatorcontrib>Kimura, Tomoyoshi</creatorcontrib><creatorcontrib>Li, Qinya</creatorcontrib><creatorcontrib>Liu, Shengzhong</creatorcontrib><creatorcontrib>Wu, Fan</creatorcontrib><creatorcontrib>Chen, Guihai</creatorcontrib><title>SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals</title><title>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</title><addtitle>ACM IMWUT</addtitle><description>This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key design constitutes of three aspects: First, we factorize the latent embedding of each modality into shared and private components and perform knowledge transfer considering both the modality information commonality and gaps. Second, we enforce structural correlation constraints between the source modality and the target modality, to push the target modal embedding space symmetric to the source modal embedding space, with the anchoring of additional source-modal samples, which expands the existing modal-matching objective in current multi-modal contrastive frameworks. Finally, we conduct downstream task finetuning in the spherical space with a KNN classifier to better align with the structured modality embedding space. Extensive evaluations on five multimodal IoT datasets are performed to validate the effectiveness of SemiCMT in cross-modal knowledge transfer, including a new self-collected dataset using seismic and acoustic signals for office activity monitoring. SemiCMT consistently outperforms existing self-supervised and knowledge transfer approaches by up to 36.47% in the finetuned target-modal classification tasks. The code and the self-collected dataset will be released at https://github.com/SJTU-RTEAS/SemiCMT.</description><subject>Computing methodologies</subject><subject>Human-centered computing</subject><subject>Machine learning</subject><subject>Ubiquitous and mobile computing</subject><subject>Ubiquitous and mobile computing theory, concepts and paradigms</subject><issn>2474-9567</issn><issn>2474-9567</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkM1Lw0AUxBdRsNTi3dPePEV3s8l-eJPgR7FBIbmHl-Ztu5JuZDda_O9taRVP82B-b2CGkEvObjjP8lshjVHKnJBJmqksMblUp__uczKL8Z0xxo0QmqkJWVe4cUVZ39Fi8GOAOLovpEUYYkzKoYOevvhh22O3QloH8NFioHYIdD7UtEIfnV_RrRvXdB-UvIEL2NHysx_d8b9yKw99vCBndic4O-qU1I8PdfGcLF6f5sX9IoGcm6TTGSCHLm81cMkssxrbFDDTgoFsjbUitakUO0ah1EyqXNhUq1ZlXGuOYkquD7HLfYWAtvkIbgPhu-Gs2U_UHCfakVcHEpabP-jX_AEmpmCD</recordid><startdate>20241121</startdate><enddate>20241121</enddate><creator>Chen, Yatong</creator><creator>Hu, Chenzhi</creator><creator>Kimura, Tomoyoshi</creator><creator>Li, Qinya</creator><creator>Liu, Shengzhong</creator><creator>Wu, Fan</creator><creator>Chen, Guihai</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0009-0008-4297-5865</orcidid><orcidid>https://orcid.org/0000-0002-7643-7239</orcidid><orcidid>https://orcid.org/0000-0002-4881-8376</orcidid><orcidid>https://orcid.org/0000-0002-8954-9109</orcidid><orcidid>https://orcid.org/0000-0002-6934-1685</orcidid><orcidid>https://orcid.org/0000-0003-0965-9058</orcidid><orcidid>https://orcid.org/0009-0007-0713-4393</orcidid></search><sort><creationdate>20241121</creationdate><title>SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals</title><author>Chen, Yatong ; Hu, Chenzhi ; Kimura, Tomoyoshi ; Li, Qinya ; Liu, Shengzhong ; Wu, Fan ; Chen, Guihai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a519-d84ae1ad5b8a160f0f8eb2ae4830a6b9ff32f263e1a7e6806753f287b741881e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computing methodologies</topic><topic>Human-centered computing</topic><topic>Machine learning</topic><topic>Ubiquitous and mobile computing</topic><topic>Ubiquitous and mobile computing theory, concepts and paradigms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Yatong</creatorcontrib><creatorcontrib>Hu, Chenzhi</creatorcontrib><creatorcontrib>Kimura, Tomoyoshi</creatorcontrib><creatorcontrib>Li, Qinya</creatorcontrib><creatorcontrib>Liu, Shengzhong</creatorcontrib><creatorcontrib>Wu, Fan</creatorcontrib><creatorcontrib>Chen, Guihai</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Yatong</au><au>Hu, Chenzhi</au><au>Kimura, Tomoyoshi</au><au>Li, Qinya</au><au>Liu, Shengzhong</au><au>Wu, Fan</au><au>Chen, Guihai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals</atitle><jtitle>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</jtitle><stitle>ACM IMWUT</stitle><date>2024-11-21</date><risdate>2024</risdate><volume>8</volume><issue>4</issue><spage>1</spage><epage>30</epage><pages>1-30</pages><artnum>198</artnum><issn>2474-9567</issn><eissn>2474-9567</eissn><abstract>This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key design constitutes of three aspects: First, we factorize the latent embedding of each modality into shared and private components and perform knowledge transfer considering both the modality information commonality and gaps. Second, we enforce structural correlation constraints between the source modality and the target modality, to push the target modal embedding space symmetric to the source modal embedding space, with the anchoring of additional source-modal samples, which expands the existing modal-matching objective in current multi-modal contrastive frameworks. Finally, we conduct downstream task finetuning in the spherical space with a KNN classifier to better align with the structured modality embedding space. Extensive evaluations on five multimodal IoT datasets are performed to validate the effectiveness of SemiCMT in cross-modal knowledge transfer, including a new self-collected dataset using seismic and acoustic signals for office activity monitoring. SemiCMT consistently outperforms existing self-supervised and knowledge transfer approaches by up to 36.47% in the finetuned target-modal classification tasks. The code and the self-collected dataset will be released at https://github.com/SJTU-RTEAS/SemiCMT.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3699779</doi><tpages>30</tpages><orcidid>https://orcid.org/0009-0008-4297-5865</orcidid><orcidid>https://orcid.org/0000-0002-7643-7239</orcidid><orcidid>https://orcid.org/0000-0002-4881-8376</orcidid><orcidid>https://orcid.org/0000-0002-8954-9109</orcidid><orcidid>https://orcid.org/0000-0002-6934-1685</orcidid><orcidid>https://orcid.org/0000-0003-0965-9058</orcidid><orcidid>https://orcid.org/0009-0007-0713-4393</orcidid></addata></record>
fulltext fulltext
identifier ISSN: 2474-9567
ispartof Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies, 2024-11, Vol.8 (4), p.1-30, Article 198
issn 2474-9567
2474-9567
language eng
recordid cdi_crossref_primary_10_1145_3699779
source ACM Digital Library Complete
subjects Computing methodologies
Human-centered computing
Machine learning
Ubiquitous and mobile computing
Ubiquitous and mobile computing theory, concepts and paradigms
title SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals
url https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T02%3A41%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SemiCMT:%20Contrastive%20Cross-Modal%20Knowledge%20Transfer%20for%20IoT%20Sensing%20with%20Semi-Paired%20Multi-Modal%20Signals&rft.jtitle=Proceedings%20of%20ACM%20on%20interactive,%20mobile,%20wearable%20and%20ubiquitous%20technologies&rft.au=Chen,%20Yatong&rft.date=2024-11-21&rft.volume=8&rft.issue=4&rft.spage=1&rft.epage=30&rft.pages=1-30&rft.artnum=198&rft.issn=2474-9567&rft.eissn=2474-9567&rft_id=info:doi/10.1145/3699779&rft_dat=%3Cacm_cross%3E3699779%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true