XYZ-channel encoding and augmentation of human joint skeleton coordinates for end-to-end action recognition
Recognizing human actions from skeletal data is a major challenge, as it does not always deliver optimal performance due to the limited ability to discern the spatio-temporal patterns inherent in skeletal data. This study aims to enhance the precision of action recognition by conceptualizing each ac...
Gespeichert in:
Veröffentlicht in: | Signal, image and video processing image and video processing, 2024-11, Vol.18 (11), p.7857-7871 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Recognizing human actions from skeletal data is a major challenge, as it does not always deliver optimal performance due to the limited ability to discern the spatio-temporal patterns inherent in skeletal data. This study aims to enhance the precision of action recognition by conceptualizing each action as a 3D matrix, accurately capturing spatio-temporal dynamics within images. These matrices offer a comprehensive encapsulation of the dynamic evolution of skeletal joint coordinates (
x
,
y
, and
z
) over time, affording a holistic comprehension of human actions. Using these 3D matrices as three-channel images enables us to capture the rich spatio-temporal information they contain. The suggested XYZ-channel action encoding facilitates the application of data augmentation techniques, thereby enhancing model generalization and robustness. Furthermore, we present a customized CNN architecture designed to efficiently extract spatiotemporal features from actions coded on the XYZ channel and classify them accurately. Extensive experiments on diverse datasets; including MSR Action3D, UTD-MAD and CZU-MHAD; demonstrate the effectiveness of the proposed CNN architecture. We achieve a test set accuracy of 96% on the MSR Action3D dataset, 97.9% on the UTD-MAD dataset and 98% on the CZU-MHAD datatset, underlining the method’s ability to accurately recognize human actions from skeletal data in challenging scenarios. |
---|---|
ISSN: | 1863-1703 1863-1711 |
DOI: | 10.1007/s11760-024-03434-4 |