LLaSA: A Multimodal LLM for Human Activity Analysis Through Wearable and Smartphone Sensors

Integrating inertial measurement units (IMUs) with large language models (LLMs) expands the potential of multimodal AI, enabling more nuanced human activity analysis. In this paper, we introduce LLaSA (Large Language and Sensor Assistant), a multimodal large language model built on LIMU-BERT and Lla...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: Imran, Sheikh Asif, Khan, Mohammad Nur Hossain, Biswas, Subrata, Islam, Bashima
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Integrating inertial measurement units (IMUs) with large language models (LLMs) expands the potential of multimodal AI, enabling more nuanced human activity analysis. In this paper, we introduce LLaSA (Large Language and Sensor Assistant), a multimodal large language model built on LIMU-BERT and Llama, designed to interpret and answer queries related to human activities and motion analysis, leveraging sensor data and contextual reasoning. To develop LLaSA, we introduce two key datasets: SensorCaps, a comprehensive collection of 35,960 IMU-derived narratives with handcrafted features, and OpenSQA, an instruction-following dataset containing 179,727 question-answer pairs aware of the sensor and human activity context. These datasets provide diverse and rich inputs to train LLaSA for complex sensor-based queries. To optimize LLaSA's performance, we apply a unique hyperparameter tuning method, which significantly enhances its effectiveness in contextual question-answering tasks. Extensive evaluations, including a human-led assessment of the question-answering, demonstrate that LLaSA achieves superior data interpretation and context-aware responses compared to GPT-3.5-Turbo and Vicuna-1.5-13b-16K. These contributions advance the frontier of sensor-aware LLMs and create new opportunities for impactful multimodal research in healthcare, sports science, and human-computer interactions. Our code repository and datasets can be found at https://github.com/BASHLab/LLaSA.
DOI:10.48550/arxiv.2406.14498