Enhancing Inertial Hand based HAR through Joint Representation of Language, Pose and Synthetic IMUs
Due to the scarcity of labeled sensor data in HAR, prior research has turned to video data to synthesize Inertial Measurement Units (IMU) data, capitalizing on its rich activity annotations. However, generating IMU data from videos presents challenges for HAR in real-world settings, attributed to th...
Gespeichert in:
Hauptverfasser: | , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Due to the scarcity of labeled sensor data in HAR, prior research has turned
to video data to synthesize Inertial Measurement Units (IMU) data, capitalizing
on its rich activity annotations. However, generating IMU data from videos
presents challenges for HAR in real-world settings, attributed to the poor
quality of synthetic IMU data and its limited efficacy in subtle, fine-grained
motions. In this paper, we propose Multi$^3$Net, our novel multi-modal,
multitask, and contrastive-based framework approach to address the issue of
limited data. Our pretraining procedure uses videos from online repositories,
aiming to learn joint representations of text, pose, and IMU simultaneously. By
employing video data and contrastive learning, our method seeks to enhance
wearable HAR performance, especially in recognizing subtle activities.Our
experimental findings validate the effectiveness of our approach in improving
HAR performance with IMU data. We demonstrate that models trained with
synthetic IMU data generated from videos using our method surpass existing
approaches in recognizing fine-grained activities. |
---|---|
DOI: | 10.48550/arxiv.2406.01316 |