Combining skeleton and accelerometer data for human fine-grained activity recognition and abnormal behaviour detection with deep temporal convolutional networks

Single sensing modality is widely adopted for human activity recognition (HAR) for decades and it has made a significant stride. However, it often suffers from challenges such as noises, obstacles, or dropped signals, which might negatively impact on the recognition performance. In this paper, we pr...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2021-08, Vol.80 (19), p.28919-28940
Hauptverfasser:	Pham, Cuong, Nguyen, Linh, Nguyen, Anh, Nguyen, Ngon, Nguyen, Van-Toi
Format:	Artikel
Sprache:	eng
Schlagworte:	Accelerometers Computer Communication Networks Computer Science Data Structures and Information Theory Datasets Feature maps Human activity recognition Human influences Machine learning Materials handling Moving object recognition Multimedia Information Systems Special Purpose and Application-Based Systems
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Single sensing modality is widely adopted for human activity recognition (HAR) for decades and it has made a significant stride. However, it often suffers from challenges such as noises, obstacles, or dropped signals, which might negatively impact on the recognition performance. In this paper, we propose a multi-sensing modality framework for human fine-grained activity recognition and abnormal behaviour detection by combining skeleton and acceleration data at feature level (so-called feature-level fusion). Firstly, deep temporal convolutional networks (TCN), consisting of the dilated causal convolution components, are utilized for feature learning and handling temporal properties. The feature map learnt and represented with convolutional layers in TCN is fed into two fully connected layers for the prediction. Secondly, we conduct an empirical experiment to verify our proposed method. Experimental results have shown that the proposed method could achieve 83% F1-score and surpassed several single modality models as well as early and late fusion methods on the Continuous Multimodal Multi-view Dataset of Human Fall Dataset (CMDFALL), comprised of 20 fine-grained normal and abnormal activities collected from 50 subjects. Moreover, our proposed architecture achieves 96.98% accuracy on the UTD-MHAD dataset, which has 8 subjects and 27 activities. These results indicate the effectiveness of our proposed method for the classification of human fine-grained normal and abnormal activities as well as the potential for HAR-based situated service applications.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-021-11058-w