Joint Motion Affinity Maps (JMAM) and Their Impact on Deep Learning Models for 3D Sign Language Recognition
Previous works on 3D joint based feature representations of the human body as colour coded images (maps) were developed based on the joint positions, distances and angles or a combination of them for applications such as human action (sign language) recognition. These 3D joint maps have shown to sin...
Gespeichert in:
Veröffentlicht in: | IEEE access 2024, Vol.12, p.11258-11275 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Previous works on 3D joint based feature representations of the human body as colour coded images (maps) were developed based on the joint positions, distances and angles or a combination of them for applications such as human action (sign language) recognition. These 3D joint maps have shown to singularly characterize both the spatial and temporal relationships between skeletal joints describing an action (sign). Consequently, the joint position and motion identification problem transformed into an image classification problem for 3D skeletal sign language (action) recognition. However, the previously proposed process of transforming 3D skeletal joints to colour coded maps has a negative proportionality component which resulted in a map with small pixel densities when the joint relationships are high. This drawback greatly impacts the learning of the classifiers to quantify the joint relationships within the colour coded maps. We hypothesized that a positive proportionality between joint motions and the corresponding maps would certainly improve classifiers performance. Hence, joint motion affinity maps(JMAM). These JMAMs use radial basis kernel on joint distances which assures a positive proportionality constant between joint motions and pixel densities of colour coded maps. To further improve the classification of 3D sign language, this work proposes congruent body part joints which results in motion directed JMAMs with maximally discriminating positive definite spatio temporal features. Finally, JMAMs are trained on the proposed multi-resolution convolutional neural network with spatial attention (MRCNNSA) architecture which produces an influencing result for the constructed 3D sign language data, KL3DISL. Consequently, online 3D datasets and standard deep learning models benchmark the proposed with respect to sign and action recognition. The results conclude that JMAMs with clustered joints characterize the subtle relationships which are otherwise difficult to be learned by a classifier. |
---|---|
ISSN: | 2169-3536 2169-3536 |
DOI: | 10.1109/ACCESS.2024.3354775 |