Segment differential aggregation representation and supervised compensation learning of ConvNets for human action recognition
With more multi-modal data available for visual classification tasks, human action recognition has become an increasingly attractive topic. However, one of the main challenges is to effectively extract complementary features from different modalities for action recognition. In this work, a novel mul...
Gespeichert in:
Veröffentlicht in: | Science China. Technological sciences 2024, Vol.67 (1), p.197-208 |
---|---|
Hauptverfasser: | , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | With more multi-modal data available for visual classification tasks, human action recognition has become an increasingly attractive topic. However, one of the main challenges is to effectively extract complementary features from different modalities for action recognition. In this work, a novel multimodal supervised learning framework based on convolution neural networks (ConvNets) is proposed to facilitate extracting the compensation features from different modalities for human action recognition. Built on information aggregation mechanism and deep ConvNets, our recognition framework represents spatial-temporal information from the base modalities by a designed frame difference aggregation spatial-temporal module (FDA-STM), that the networks bridges information from skeleton data through a multimodal supervised compensation block (SCB) to supervise the extraction of compensation features. We evaluate the proposed recognition framework on three human action datasets, including NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD. The results demonstrate that our model with FDA-STM and SCB achieves the state-of-the-art recognition performance on three benchmark datasets. |
---|---|
ISSN: | 1674-7321 1869-1900 |
DOI: | 10.1007/s11431-023-2491-4 |