A four-stream ConvNet based on spatial and depth flow for human action classification using RGB-D data

Appearance and depth-based action recognition has been researched exclusively for improving recognition accuracy by considering motion and shape recovery particulars from RGB-D video data. Convolutional neural networks (CNN) have shown evidences of superiority on action classification problems with...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2020-05, Vol.79 (17-18), p.11723-11746
Hauptverfasser:	Srihari, D., Kishore, P. V. V., Kumar, E. Kiran, Kumar, D. Anil, Kumar, M. Teja Kiran, Prasad, M. V. D., Prasad, Ch. Raghava
Format:	Artikel
Sprache:	eng
Schlagworte:	Algorithms Artificial neural networks Classification Computer Communication Networks Computer Science Data Structures and Information Theory Data transmission Datasets Machine learning Motion capture Multimedia Multimedia Information Systems Neural networks Optical flow (image analysis) Spatial data Special Purpose and Application-Based Systems State-of-the-art reviews Video data Walking
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Appearance and depth-based action recognition has been researched exclusively for improving recognition accuracy by considering motion and shape recovery particulars from RGB-D video data. Convolutional neural networks (CNN) have shown evidences of superiority on action classification problems with spatial and apparent motion inputs. The current generation of CNNs use spatial RGB videos and depth maps to recognize action classes from RGB-D video. In this work, we propose a 4-stream CNN architecture that has two spatial RGB-D video data streams and two apparent motion streams, with inputs extracted from the optical flow of RGB-D videos. Each CNN stream is packed with 8 convolutional layers, 2 dense and one SoftMax layer, and a score fusion model to merge the scores from four streams. Performance of the proposed 4-stream action recognition framework is tested on our own action dataset and three benchmark datasets for action recognition. The usefulness of the proposed model is evaluated with state-of-the-art CNN architectures for action recognition.
ISSN:	1380-7501 1573-7721
DOI:	10.1007/s11042-019-08588-9