Leveraging spatio-temporal features using graph neural networks for human activity recognition

Unsupervised human activity recognition (HAR) algorithms working on motion capture (mocap) data often use spatial information and neglect the activity-specific information contained in the temporal sequences. In this work, we propose a new unsupervised algorithm for HAR from mocap data to leverage b...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Pattern recognition 2024-06, Vol.150, p.110301, Article 110301
Hauptverfasser: Raj, M.S. Subodh, George, Sudhish N., Raja, Kiran
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Unsupervised human activity recognition (HAR) algorithms working on motion capture (mocap) data often use spatial information and neglect the activity-specific information contained in the temporal sequences. In this work, we propose a new unsupervised algorithm for HAR from mocap data to leverage both spatial and temporal information embedded in activity sequences. For this, we employ a shallow graph neural network (GNN) comprising a graph convolutional network and a gated recurrent unit to aggregate the spatial and temporal features of the mocap sequences, respectively. Moreover, we encode the transformations of the human body through log-regularized kernel covariance descriptors linked to the trajectory movement maps of mocap frames. These descriptors are then fused with the GNN features for downstream activity recognition tasks. Finally, HAR is performed by a new unsupervised algorithm using a neighborhood Laplacian regularizer and a normalized dictionary learning approach. The generalizability of the proposed model is validated by training the GNN on a public dataset and testing on the other datasets. The performance of the proposed model is evaluated using six publicly available human mocap datasets. Compared to existing approaches, the proposed model improves activity recognition consistently by 12%–30% across different datasets. •Unsupervised algorithm for human activity recognition from motion capture data.•Feature fusion model employing graph neural network for robust feature extraction.•Normalized dictionary learning approach for representation matrix generation.•Log-regularized kernel covariance descriptors encode human body transformations.•Neighborhood Laplacian regularizer captures the dependencies of activity subspaces.
ISSN:0031-3203
1873-5142
DOI:10.1016/j.patcog.2024.110301