SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals

This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled trai...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies mobile, wearable and ubiquitous technologies, 2024-11, Vol.8 (4), p.1-30, Article 198
Hauptverfasser:	Chen, Yatong, Hu, Chenzhi, Kimura, Tomoyoshi, Li, Qinya, Liu, Shengzhong, Wu, Fan, Chen, Guihai
Format:	Artikel
Sprache:	eng
Schlagworte:	Computing methodologies Human-centered computing Machine learning Ubiquitous and mobile computing Ubiquitous and mobile computing theory, concepts and paradigms
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

container_end_page	30
container_issue	4
container_start_page	1
container_title	Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies
container_volume	8
creator	Chen, Yatong Hu, Chenzhi Kimura, Tomoyoshi Li, Qinya Liu, Shengzhong Wu, Fan Chen, Guihai
description	This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key design constitutes of three aspects: First, we factorize the latent embedding of each modality into shared and private components and perform knowledge transfer considering both the modality information commonality and gaps. Second, we enforce structural correlation constraints between the source modality and the target modality, to push the target modal embedding space symmetric to the source modal embedding space, with the anchoring of additional source-modal samples, which expands the existing modal-matching objective in current multi-modal contrastive frameworks. Finally, we conduct downstream task finetuning in the spherical space with a KNN classifier to better align with the structured modality embedding space. Extensive evaluations on five multimodal IoT datasets are performed to validate the effectiveness of SemiCMT in cross-modal knowledge transfer, including a new self-collected dataset using seismic and acoustic signals for office activity monitoring. SemiCMT consistently outperforms existing self-supervised and knowledge transfer approaches by up to 36.47% in the finetuned target-modal classification tasks. The code and the self-collected dataset will be released at https://github.com/SJTU-RTEAS/SemiCMT.
doi_str_mv	10.1145/3699779
format	Article
fullrecord	<record><control><sourceid>acm_cross</sourceid><recordid>TN_cdi_crossref_primary_10_1145_3699779</recordid><sourceformat>XML</sourceformat><sourcesystem>PC</sourcesystem><sourcerecordid>3699779</sourcerecordid><originalsourceid>FETCH-LOGICAL-a519-d84ae1ad5b8a160f0f8eb2ae4830a6b9ff32f263e1a7e6806753f287b741881e3</originalsourceid><addsrcrecordid>eNpNkM1Lw0AUxBdRsNTi3dPePEV3s8l-eJPgR7FBIbmHl-Ztu5JuZDda_O9taRVP82B-b2CGkEvObjjP8lshjVHKnJBJmqksMblUp__uczKL8Z0xxo0QmqkJWVe4cUVZ39Fi8GOAOLovpEUYYkzKoYOevvhh22O3QloH8NFioHYIdD7UtEIfnV_RrRvXdB-UvIEL2NHysx_d8b9yKw99vCBndic4O-qU1I8PdfGcLF6f5sX9IoGcm6TTGSCHLm81cMkssxrbFDDTgoFsjbUitakUO0ah1EyqXNhUq1ZlXGuOYkquD7HLfYWAtvkIbgPhu-Gs2U_UHCfakVcHEpabP-jX_AEmpmCD</addsrcrecordid><sourcetype>Aggregation Database</sourcetype><iscdi>true</iscdi><recordtype>article</recordtype></control><display><type>article</type><title>SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals</title><source>ACM Digital Library Complete</source><creator>Chen, Yatong ; Hu, Chenzhi ; Kimura, Tomoyoshi ; Li, Qinya ; Liu, Shengzhong ; Wu, Fan ; Chen, Guihai</creator><creatorcontrib>Chen, Yatong ; Hu, Chenzhi ; Kimura, Tomoyoshi ; Li, Qinya ; Liu, Shengzhong ; Wu, Fan ; Chen, Guihai</creatorcontrib><description>This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key design constitutes of three aspects: First, we factorize the latent embedding of each modality into shared and private components and perform knowledge transfer considering both the modality information commonality and gaps. Second, we enforce structural correlation constraints between the source modality and the target modality, to push the target modal embedding space symmetric to the source modal embedding space, with the anchoring of additional source-modal samples, which expands the existing modal-matching objective in current multi-modal contrastive frameworks. Finally, we conduct downstream task finetuning in the spherical space with a KNN classifier to better align with the structured modality embedding space. Extensive evaluations on five multimodal IoT datasets are performed to validate the effectiveness of SemiCMT in cross-modal knowledge transfer, including a new self-collected dataset using seismic and acoustic signals for office activity monitoring. SemiCMT consistently outperforms existing self-supervised and knowledge transfer approaches by up to 36.47% in the finetuned target-modal classification tasks. The code and the self-collected dataset will be released at https://github.com/SJTU-RTEAS/SemiCMT.</description><identifier>ISSN: 2474-9567</identifier><identifier>EISSN: 2474-9567</identifier><identifier>DOI: 10.1145/3699779</identifier><language>eng</language><publisher>New York, NY, USA: ACM</publisher><subject>Computing methodologies ; Human-centered computing ; Machine learning ; Ubiquitous and mobile computing ; Ubiquitous and mobile computing theory, concepts and paradigms</subject><ispartof>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies, 2024-11, Vol.8 (4), p.1-30, Article 198</ispartof><rights>ACM</rights><lds50>peer_reviewed</lds50><woscitedreferencessubscribed>false</woscitedreferencessubscribed><cites>FETCH-LOGICAL-a519-d84ae1ad5b8a160f0f8eb2ae4830a6b9ff32f263e1a7e6806753f287b741881e3</cites><orcidid>0009-0008-4297-5865 ; 0000-0002-7643-7239 ; 0000-0002-4881-8376 ; 0000-0002-8954-9109 ; 0000-0002-6934-1685 ; 0000-0003-0965-9058 ; 0009-0007-0713-4393</orcidid></display><links><openurl>$$Topenurl_article</openurl><openurlfulltext>$$Topenurlfull_article</openurlfulltext><thumbnail>$$Tsyndetics_thumb_exl</thumbnail><linktopdf>$$Uhttps://dl.acm.org/doi/pdf/10.1145/3699779$$EPDF$$P50$$Gacm$$H</linktopdf><link.rule.ids>314,780,784,2280,27922,27923,40194,75998</link.rule.ids></links><search><creatorcontrib>Chen, Yatong</creatorcontrib><creatorcontrib>Hu, Chenzhi</creatorcontrib><creatorcontrib>Kimura, Tomoyoshi</creatorcontrib><creatorcontrib>Li, Qinya</creatorcontrib><creatorcontrib>Liu, Shengzhong</creatorcontrib><creatorcontrib>Wu, Fan</creatorcontrib><creatorcontrib>Chen, Guihai</creatorcontrib><title>SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals</title><title>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</title><addtitle>ACM IMWUT</addtitle><description>This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key design constitutes of three aspects: First, we factorize the latent embedding of each modality into shared and private components and perform knowledge transfer considering both the modality information commonality and gaps. Second, we enforce structural correlation constraints between the source modality and the target modality, to push the target modal embedding space symmetric to the source modal embedding space, with the anchoring of additional source-modal samples, which expands the existing modal-matching objective in current multi-modal contrastive frameworks. Finally, we conduct downstream task finetuning in the spherical space with a KNN classifier to better align with the structured modality embedding space. Extensive evaluations on five multimodal IoT datasets are performed to validate the effectiveness of SemiCMT in cross-modal knowledge transfer, including a new self-collected dataset using seismic and acoustic signals for office activity monitoring. SemiCMT consistently outperforms existing self-supervised and knowledge transfer approaches by up to 36.47% in the finetuned target-modal classification tasks. The code and the self-collected dataset will be released at https://github.com/SJTU-RTEAS/SemiCMT.</description><subject>Computing methodologies</subject><subject>Human-centered computing</subject><subject>Machine learning</subject><subject>Ubiquitous and mobile computing</subject><subject>Ubiquitous and mobile computing theory, concepts and paradigms</subject><issn>2474-9567</issn><issn>2474-9567</issn><fulltext>true</fulltext><rsrctype>article</rsrctype><creationdate>2024</creationdate><recordtype>article</recordtype><recordid>eNpNkM1Lw0AUxBdRsNTi3dPePEV3s8l-eJPgR7FBIbmHl-Ztu5JuZDda_O9taRVP82B-b2CGkEvObjjP8lshjVHKnJBJmqksMblUp__uczKL8Z0xxo0QmqkJWVe4cUVZ39Fi8GOAOLovpEUYYkzKoYOevvhh22O3QloH8NFioHYIdD7UtEIfnV_RrRvXdB-UvIEL2NHysx_d8b9yKw99vCBndic4O-qU1I8PdfGcLF6f5sX9IoGcm6TTGSCHLm81cMkssxrbFDDTgoFsjbUitakUO0ah1EyqXNhUq1ZlXGuOYkquD7HLfYWAtvkIbgPhu-Gs2U_UHCfakVcHEpabP-jX_AEmpmCD</recordid><startdate>20241121</startdate><enddate>20241121</enddate><creator>Chen, Yatong</creator><creator>Hu, Chenzhi</creator><creator>Kimura, Tomoyoshi</creator><creator>Li, Qinya</creator><creator>Liu, Shengzhong</creator><creator>Wu, Fan</creator><creator>Chen, Guihai</creator><general>ACM</general><scope>AAYXX</scope><scope>CITATION</scope><orcidid>https://orcid.org/0009-0008-4297-5865</orcidid><orcidid>https://orcid.org/0000-0002-7643-7239</orcidid><orcidid>https://orcid.org/0000-0002-4881-8376</orcidid><orcidid>https://orcid.org/0000-0002-8954-9109</orcidid><orcidid>https://orcid.org/0000-0002-6934-1685</orcidid><orcidid>https://orcid.org/0000-0003-0965-9058</orcidid><orcidid>https://orcid.org/0009-0007-0713-4393</orcidid></search><sort><creationdate>20241121</creationdate><title>SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals</title><author>Chen, Yatong ; Hu, Chenzhi ; Kimura, Tomoyoshi ; Li, Qinya ; Liu, Shengzhong ; Wu, Fan ; Chen, Guihai</author></sort><facets><frbrtype>5</frbrtype><frbrgroupid>cdi_FETCH-LOGICAL-a519-d84ae1ad5b8a160f0f8eb2ae4830a6b9ff32f263e1a7e6806753f287b741881e3</frbrgroupid><rsrctype>articles</rsrctype><prefilter>articles</prefilter><language>eng</language><creationdate>2024</creationdate><topic>Computing methodologies</topic><topic>Human-centered computing</topic><topic>Machine learning</topic><topic>Ubiquitous and mobile computing</topic><topic>Ubiquitous and mobile computing theory, concepts and paradigms</topic><toplevel>peer_reviewed</toplevel><toplevel>online_resources</toplevel><creatorcontrib>Chen, Yatong</creatorcontrib><creatorcontrib>Hu, Chenzhi</creatorcontrib><creatorcontrib>Kimura, Tomoyoshi</creatorcontrib><creatorcontrib>Li, Qinya</creatorcontrib><creatorcontrib>Liu, Shengzhong</creatorcontrib><creatorcontrib>Wu, Fan</creatorcontrib><creatorcontrib>Chen, Guihai</creatorcontrib><collection>CrossRef</collection><jtitle>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</jtitle></facets><delivery><delcategory>Remote Search Resource</delcategory><fulltext>fulltext</fulltext></delivery><addata><au>Chen, Yatong</au><au>Hu, Chenzhi</au><au>Kimura, Tomoyoshi</au><au>Li, Qinya</au><au>Liu, Shengzhong</au><au>Wu, Fan</au><au>Chen, Guihai</au><format>journal</format><genre>article</genre><ristype>JOUR</ristype><atitle>SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals</atitle><jtitle>Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies</jtitle><stitle>ACM IMWUT</stitle><date>2024-11-21</date><risdate>2024</risdate><volume>8</volume><issue>4</issue><spage>1</spage><epage>30</epage><pages>1-30</pages><artnum>198</artnum><issn>2474-9567</issn><eissn>2474-9567</eissn><abstract>This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key design constitutes of three aspects: First, we factorize the latent embedding of each modality into shared and private components and perform knowledge transfer considering both the modality information commonality and gaps. Second, we enforce structural correlation constraints between the source modality and the target modality, to push the target modal embedding space symmetric to the source modal embedding space, with the anchoring of additional source-modal samples, which expands the existing modal-matching objective in current multi-modal contrastive frameworks. Finally, we conduct downstream task finetuning in the spherical space with a KNN classifier to better align with the structured modality embedding space. Extensive evaluations on five multimodal IoT datasets are performed to validate the effectiveness of SemiCMT in cross-modal knowledge transfer, including a new self-collected dataset using seismic and acoustic signals for office activity monitoring. SemiCMT consistently outperforms existing self-supervised and knowledge transfer approaches by up to 36.47% in the finetuned target-modal classification tasks. The code and the self-collected dataset will be released at https://github.com/SJTU-RTEAS/SemiCMT.</abstract><cop>New York, NY, USA</cop><pub>ACM</pub><doi>10.1145/3699779</doi><tpages>30</tpages><orcidid>https://orcid.org/0009-0008-4297-5865</orcidid><orcidid>https://orcid.org/0000-0002-7643-7239</orcidid><orcidid>https://orcid.org/0000-0002-4881-8376</orcidid><orcidid>https://orcid.org/0000-0002-8954-9109</orcidid><orcidid>https://orcid.org/0000-0002-6934-1685</orcidid><orcidid>https://orcid.org/0000-0003-0965-9058</orcidid><orcidid>https://orcid.org/0009-0007-0713-4393</orcidid></addata></record>
fulltext	fulltext
identifier	ISSN: 2474-9567
ispartof	Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies, 2024-11, Vol.8 (4), p.1-30, Article 198
issn	2474-9567 2474-9567
language	eng
recordid	cdi_crossref_primary_10_1145_3699779
source	ACM Digital Library Complete
subjects	Computing methodologies Human-centered computing Machine learning Ubiquitous and mobile computing Ubiquitous and mobile computing theory, concepts and paradigms
title	SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals
url	https://sfx.bib-bvb.de/sfx_tum?ctx_ver=Z39.88-2004&ctx_enc=info:ofi/enc:UTF-8&ctx_tim=2025-01-10T02%3A41%3A57IST&url_ver=Z39.88-2004&url_ctx_fmt=infofi/fmt:kev:mtx:ctx&rfr_id=info:sid/primo.exlibrisgroup.com:primo3-Article-acm_cross&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.genre=article&rft.atitle=SemiCMT:%20Contrastive%20Cross-Modal%20Knowledge%20Transfer%20for%20IoT%20Sensing%20with%20Semi-Paired%20Multi-Modal%20Signals&rft.jtitle=Proceedings%20of%20ACM%20on%20interactive,%20mobile,%20wearable%20and%20ubiquitous%20technologies&rft.au=Chen,%20Yatong&rft.date=2024-11-21&rft.volume=8&rft.issue=4&rft.spage=1&rft.epage=30&rft.pages=1-30&rft.artnum=198&rft.issn=2474-9567&rft.eissn=2474-9567&rft_id=info:doi/10.1145/3699779&rft_dat=%3Cacm_cross%3E3699779%3C/acm_cross%3E%3Curl%3E%3C/url%3E&disable_directlink=true&sfx.directlink=off&sfx.report_link=0&rft_id=info:oai/&rft_id=info:pmid/&rfr_iscdi=true