STSD: spatial–temporal semantic decomposition transformer for skeleton-based action recognition
Skeleton-based human action recognition has attracted widespread interest, as skeleton data are extremely robust to changes in lighting, camera views, and complex backgrounds. In recent studies, transformer-based methods are proposed for the encoding of the latent information underlying the 3D skele...
Gespeichert in:
Veröffentlicht in: | Multimedia systems 2024-02, Vol.30 (1), Article 43 |
---|---|
Hauptverfasser: | , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Skeleton-based human action recognition has attracted widespread interest, as skeleton data are extremely robust to changes in lighting, camera views, and complex backgrounds. In recent studies, transformer-based methods are proposed for the encoding of the latent information underlying the 3D skeleton. These methods focus on modeling the relationships of joints in skeleton sequences without any predefined graphical information by self-attention mechanism and have been proven to be effective. But there are two challenging issues ignored in these methods: the utilization of human body-related and dynamic semantic information. In this work, we propose a novel spatial–temporal semantic decomposition transformer network (STSD-TR) that models dependencies between joints with body parts semantics and sub-action semantics. In our STSD-TR, a body parts semantic decomposition module (BPSD) is used to extract body parts semantic information from 3D coordinates of joints, and then a temporal-local spatial–temporal attention module (TL-STA) is used to capture the relationships of joints in several consecutive frames which can be understood as local sub-action semantic information. Finally, a global spatial–temporal module (GST) is used to aggregate the temporal-local features and generate a global spatial–temporal representation. Moreover, we design a BodyParts-Mix strategy which mixes body parts from two people in a unique manner and further boosts the performance. Compared with the state-of-the-art methods, our method achieves competitive performance on two large-scale datasets. |
---|---|
ISSN: | 0942-4962 1432-1882 |
DOI: | 10.1007/s00530-023-01251-2 |