Variation-aware directed graph convolutional networks for skeleton-based action recognition

Directed Graph convolutional networks (DGCNs) have been indeed gaining attention and being applied in skeleton-based action recognition tasks to capture the hierarchical relationships of skeleton via directed graph topology. However, they typically pay the same attention to regions with variations a...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Knowledge-based systems 2024-10, Vol.302, p.112319, Article 112319
Hauptverfasser:	Li, Tianchen, Geng, Pei, Cai, Guohui, Hou, Xinran, Lu, Xuequan, Lyu, Lei
Format:	Artikel
Sprache:	eng
Schlagworte:	3D human skeleton data Fine-grained action recognition Graph convolutional network Self-attention mechanism
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Directed Graph convolutional networks (DGCNs) have been indeed gaining attention and being applied in skeleton-based action recognition tasks to capture the hierarchical relationships of skeleton via directed graph topology. However, they typically pay the same attention to regions with variations and regions with relative statics of the skeleton, which leads to low accuracy when recognizing fine-grained actions. To this end, we design an innovative variation-aware directed graph convolutional network (VA-DGCN) to focus on the regions where the variation takes place. VA-DGCN comprises a variation-aware directed spatial convolution (VDSC) module and a multi-scale contrastive temporal convolution (MCTC) module. Specifically, in order to capture subtle variations in fine-grained actions, VDSC introduce the average posture of the action sequence as the static anchor. In VDSC, the channel-specific topology branch is designed to model the different kinematic properties of different channels to extract global features, to which global-attention graph convolution is added for unnaturally connected joints. Subsequently, we identify the regions with variations by comparing global features acquired from action sequence and average posture. In temporal, MCTC comprises multiple branches for extracting temporal features at different scales. Moreover, to maximize the mutual information between branches, we introduce contrastive learning to drive the module to learn more meaningful action representations. We conduct extensive experiments on three public datasets to validate the feasibility and efficacy of our proposed VA-DGCN. [Display omitted] •Introducing VA-DGCN with average posture as anchor to boost spatial variation emphasis.•Utilizing a channel-specific topology branch for diverse channel topologies.•Designing MCTC module with contrastive learning for improved multi-scale temporal relationships.
ISSN:	0950-7051
DOI:	10.1016/j.knosys.2024.112319