Human Action Recognition (HAR) Using Skeleton-based Spatial Temporal Relative Transformer Network: ST-RTR
Human Action Recognition (HAR) is an interesting research area in human-computer interaction used to monitor the activities of elderly and disabled individuals affected by physical and mental health. In the recent era, skeleton-based HAR has received much attention because skeleton data has shown th...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Human Action Recognition (HAR) is an interesting research area in
human-computer interaction used to monitor the activities of elderly and
disabled individuals affected by physical and mental health. In the recent era,
skeleton-based HAR has received much attention because skeleton data has shown
that it can handle changes in striking, body size, camera views, and complex
backgrounds. One key characteristic of ST-GCN is automatically learning spatial
and temporal patterns from skeleton sequences. It has some limitations, as this
method only works for short-range correlation due to its limited receptive
field. Consequently, understanding human action requires long-range
interconnection. To address this issue, we developed a spatial-temporal
relative transformer ST-RTR model. The ST-RTR includes joint and relay nodes,
which allow efficient communication and data transmission within the network.
These nodes help to break the inherent spatial and temporal skeleton
topologies, which enables the model to understand long-range human action
better. Furthermore, we combine ST-RTR with a fusion model for further
performance improvements. To assess the performance of the ST-RTR method, we
conducted experiments on three skeleton-based HAR benchmarks: NTU RGB+D 60, NTU
RGB+D 120, and UAV-Human. It boosted CS and CV by 2.11 % and 1.45% on NTU RGB+D
60, 1.25% and 1.05% on NTU RGB+D 120. On UAV-Human datasets, accuracy improved
by 2.54%. The experimental outcomes explain that the proposed ST-RTR model
significantly improves action recognition associated with the standard ST-GCN
method. |
---|---|
DOI: | 10.48550/arxiv.2410.23806 |