Variation-aware directed graph convolutional networks for skeleton-based action recognition
Directed Graph convolutional networks (DGCNs) have been indeed gaining attention and being applied in skeleton-based action recognition tasks to capture the hierarchical relationships of skeleton via directed graph topology. However, they typically pay the same attention to regions with variations a...
Gespeichert in:
Veröffentlicht in: | Knowledge-based systems 2024-10, Vol.302, p.112319, Article 112319 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Directed Graph convolutional networks (DGCNs) have been indeed gaining attention and being applied in skeleton-based action recognition tasks to capture the hierarchical relationships of skeleton via directed graph topology. However, they typically pay the same attention to regions with variations and regions with relative statics of the skeleton, which leads to low accuracy when recognizing fine-grained actions. To this end, we design an innovative variation-aware directed graph convolutional network (VA-DGCN) to focus on the regions where the variation takes place. VA-DGCN comprises a variation-aware directed spatial convolution (VDSC) module and a multi-scale contrastive temporal convolution (MCTC) module. Specifically, in order to capture subtle variations in fine-grained actions, VDSC introduce the average posture of the action sequence as the static anchor. In VDSC, the channel-specific topology branch is designed to model the different kinematic properties of different channels to extract global features, to which global-attention graph convolution is added for unnaturally connected joints. Subsequently, we identify the regions with variations by comparing global features acquired from action sequence and average posture. In temporal, MCTC comprises multiple branches for extracting temporal features at different scales. Moreover, to maximize the mutual information between branches, we introduce contrastive learning to drive the module to learn more meaningful action representations. We conduct extensive experiments on three public datasets to validate the feasibility and efficacy of our proposed VA-DGCN.
[Display omitted]
•Introducing VA-DGCN with average posture as anchor to boost spatial variation emphasis.•Utilizing a channel-specific topology branch for diverse channel topologies.•Designing MCTC module with contrastive learning for improved multi-scale temporal relationships. |
---|---|
ISSN: | 0950-7051 |
DOI: | 10.1016/j.knosys.2024.112319 |