SemiCMT: Contrastive Cross-Modal Knowledge Transfer for IoT Sensing with Semi-Paired Multi-Modal Signals
This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled trai...
Gespeichert in:
Veröffentlicht in: | Proceedings of ACM on interactive, mobile, wearable and ubiquitous technologies mobile, wearable and ubiquitous technologies, 2024-11, Vol.8 (4), p.1-30, Article 198 |
---|---|
Hauptverfasser: | , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | This paper proposes a novel contrastive cross-modal knowledge transfer framework, SemiCMT, for multi-modal IoT sensing applications. It effectively transfers the feature extraction capability (also called knowledge) learned from a source modality (e.g., acoustic signals) with abundant unlabeled training data, to a target modality (e.g., seismic signals) that lacks enough training data, in a self-supervised manner with the help of only a small set of synchronized multi-modal pairs. The transferred model can be quickly finetuned to downstream target-modal tasks with only limited labels. The key design constitutes of three aspects: First, we factorize the latent embedding of each modality into shared and private components and perform knowledge transfer considering both the modality information commonality and gaps. Second, we enforce structural correlation constraints between the source modality and the target modality, to push the target modal embedding space symmetric to the source modal embedding space, with the anchoring of additional source-modal samples, which expands the existing modal-matching objective in current multi-modal contrastive frameworks. Finally, we conduct downstream task finetuning in the spherical space with a KNN classifier to better align with the structured modality embedding space. Extensive evaluations on five multimodal IoT datasets are performed to validate the effectiveness of SemiCMT in cross-modal knowledge transfer, including a new self-collected dataset using seismic and acoustic signals for office activity monitoring. SemiCMT consistently outperforms existing self-supervised and knowledge transfer approaches by up to 36.47% in the finetuned target-modal classification tasks. The code and the self-collected dataset will be released at https://github.com/SJTU-RTEAS/SemiCMT. |
---|---|
ISSN: | 2474-9567 2474-9567 |
DOI: | 10.1145/3699779 |