Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks
Single sensing modality is widely adopted for human activity recognition (HAR) for decades and it has made a significant stride. However, it often suffers from challenges such as noises, obstacles, or dropped signals, which might negatively impact on the recognition performance. In this paper, we pr...
Gespeichert in:
Veröffentlicht in: | Multimedia tools and applications 2021-08, Vol.80 (19), p.28919-28940 |
---|---|
Hauptverfasser: | , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Single sensing modality is widely adopted for human activity recognition (HAR) for decades and it has made a significant stride. However, it often suffers from challenges such as noises, obstacles, or dropped signals, which might negatively impact on the recognition performance. In this paper, we propose a multi-sensing modality framework for human fine-grained activity recognition and abnormal behaviour detection by combining skeleton and acceleration data at feature level (so-called feature-level fusion). Firstly, deep temporal convolutional networks (TCN), consisting of the dilated causal convolution components, are utilized for feature learning and handling temporal properties. The feature map learnt and represented with convolutional layers in TCN is fed into two fully connected layers for the prediction. Secondly, we conduct an empirical experiment to verify our proposed method. Experimental results have shown that the proposed method could achieve 83% F1-score and surpassed several single modality models as well as early and late fusion methods on the Continuous Multimodal Multi-view Dataset of Human Fall Dataset (CMDFALL), comprised of 20 fine-grained normal and abnormal activities collected from 50 subjects. Moreover, our proposed architecture achieves 96.98% accuracy on the UTD-MHAD dataset, which has 8 subjects and 27 activities. These results indicate the effectiveness of our proposed method for the classification of human fine-grained normal and abnormal activities as well as the potential for HAR-based situated service applications. |
---|---|
ISSN: | 1380-7501 1573-7721 |
DOI: | 10.1007/s11042-021-11058-w |