A four-stream ConvNet based on spatial and depth flow for human action classification using RGB-D data

Appearance and depth-based action recognition has been researched exclusively for improving recognition accuracy by considering motion and shape recovery particulars from RGB-D video data. Convolutional neural networks (CNN) have shown evidences of superiority on action classification problems with...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Multimedia tools and applications 2020-05, Vol.79 (17-18), p.11723-11746
Hauptverfasser: Srihari, D., Kishore, P. V. V., Kumar, E. Kiran, Kumar, D. Anil, Kumar, M. Teja Kiran, Prasad, M. V. D., Prasad, Ch. Raghava
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Appearance and depth-based action recognition has been researched exclusively for improving recognition accuracy by considering motion and shape recovery particulars from RGB-D video data. Convolutional neural networks (CNN) have shown evidences of superiority on action classification problems with spatial and apparent motion inputs. The current generation of CNNs use spatial RGB videos and depth maps to recognize action classes from RGB-D video. In this work, we propose a 4-stream CNN architecture that has two spatial RGB-D video data streams and two apparent motion streams, with inputs extracted from the optical flow of RGB-D videos. Each CNN stream is packed with 8 convolutional layers, 2 dense and one SoftMax layer, and a score fusion model to merge the scores from four streams. Performance of the proposed 4-stream action recognition framework is tested on our own action dataset and three benchmark datasets for action recognition. The usefulness of the proposed model is evaluated with state-of-the-art CNN architectures for action recognition.
ISSN:1380-7501
1573-7721
DOI:10.1007/s11042-019-08588-9