Segment differential aggregation representation and supervised compensation learning of ConvNets for human action recognition

With more multi-modal data available for visual classification tasks, human action recognition has become an increasingly attractive topic. However, one of the main challenges is to effectively extract complementary features from different modalities for action recognition. In this work, a novel mul...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Science China. Technological sciences 2024, Vol.67 (1), p.197-208
Hauptverfasser:	Ren, ZiLiang, Zhang, QieShi, Cheng, Qin, Xu, ZhenYu, Yuan, Shuai, Luo, DeLin
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial neural networks Compensation Datasets Engineering Human activity recognition Machine learning Modal data Supervised learning Visual tasks
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	With more multi-modal data available for visual classification tasks, human action recognition has become an increasingly attractive topic. However, one of the main challenges is to effectively extract complementary features from different modalities for action recognition. In this work, a novel multimodal supervised learning framework based on convolution neural networks (ConvNets) is proposed to facilitate extracting the compensation features from different modalities for human action recognition. Built on information aggregation mechanism and deep ConvNets, our recognition framework represents spatial-temporal information from the base modalities by a designed frame difference aggregation spatial-temporal module (FDA-STM), that the networks bridges information from skeleton data through a multimodal supervised compensation block (SCB) to supervise the extraction of compensation features. We evaluate the proposed recognition framework on three human action datasets, including NTU RGB+D 60, NTU RGB+D 120, and PKU-MMD. The results demonstrate that our model with FDA-STM and SCB achieves the state-of-the-art recognition performance on three benchmark datasets.
ISSN:	1674-7321 1869-1900
DOI:	10.1007/s11431-023-2491-4