IMU2CLIP: Multimodal Contrastive Learning for IMU Motion Sensors from Egocentric Videos and Text
We present IMU2CLIP, a novel pre-training approach to align Inertial Measurement Unit (IMU) motion sensor recordings with video and text, by projecting them into the joint representation space of Contrastive Language-Image Pre-training (CLIP). The proposed approach allows IMU2CLIP to translate human...
Gespeichert in:
Hauptverfasser: | , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | We present IMU2CLIP, a novel pre-training approach to align Inertial
Measurement Unit (IMU) motion sensor recordings with video and text, by
projecting them into the joint representation space of Contrastive
Language-Image Pre-training (CLIP). The proposed approach allows IMU2CLIP to
translate human motions (as measured by IMU sensors) into their corresponding
textual descriptions and videos -- while preserving the transitivity across
these modalities.
We explore several new IMU-based applications that IMU2CLIP enables, such as
motion-based media retrieval and natural language reasoning tasks with motion
data. In addition, we show that IMU2CLIP can significantly improve the
downstream performance when fine-tuned for each application (e.g. activity
recognition), demonstrating the universal usage of IMU2CLIP as a new
pre-trained resource. Our code will be made publicly available. |
---|---|
DOI: | 10.48550/arxiv.2210.14395 |