XYZ-channel encoding and augmentation of human joint skeleton coordinates for end-to-end action recognition

Recognizing human actions from skeletal data is a major challenge, as it does not always deliver optimal performance due to the limited ability to discern the spatio-temporal patterns inherent in skeletal data. This study aims to enhance the precision of action recognition by conceptualizing each ac...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Signal, image and video processing image and video processing, 2024-11, Vol.18 (11), p.7857-7871
Hauptverfasser: Elaoud, Amani, Ghazouani, Haythem, Barhoumi, Walid
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Recognizing human actions from skeletal data is a major challenge, as it does not always deliver optimal performance due to the limited ability to discern the spatio-temporal patterns inherent in skeletal data. This study aims to enhance the precision of action recognition by conceptualizing each action as a 3D matrix, accurately capturing spatio-temporal dynamics within images. These matrices offer a comprehensive encapsulation of the dynamic evolution of skeletal joint coordinates ( x , y , and z ) over time, affording a holistic comprehension of human actions. Using these 3D matrices as three-channel images enables us to capture the rich spatio-temporal information they contain. The suggested XYZ-channel action encoding facilitates the application of data augmentation techniques, thereby enhancing model generalization and robustness. Furthermore, we present a customized CNN architecture designed to efficiently extract spatiotemporal features from actions coded on the XYZ channel and classify them accurately. Extensive experiments on diverse datasets; including MSR Action3D, UTD-MAD and CZU-MHAD; demonstrate the effectiveness of the proposed CNN architecture. We achieve a test set accuracy of 96% on the MSR Action3D dataset, 97.9% on the UTD-MAD dataset and 98% on the CZU-MHAD datatset, underlining the method’s ability to accurately recognize human actions from skeletal data in challenging scenarios.
ISSN:1863-1703
1863-1711
DOI:10.1007/s11760-024-03434-4