Modelling a Spatial-Motion Deep Learning Framework to Classify Dynamic Patterns of Videos

Video classification is an essential process for analyzing the pervasive semantic information of video content in computer vision. Traditional hand-crafted features are insufficient when classifying complex video information due to the similarity of visual contents with different illumination condit...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Applied sciences 2020-02, Vol.10 (4), p.1479
Hauptverfasser:	Kasthuri Arachchi, Sandeli Priyanwada, Shih, Timothy K., Hakim, Noorkholis Luthfil
Format:	Artikel
Sprache:	eng
Schlagworte:	Classification Computer vision convolution neural network Datasets Deep learning dynamic pattern classification Fireworks Long short-term memory Methods Neural networks recurrent neural network Spatial discrimination learning spatiotemporal data Video data
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Video classification is an essential process for analyzing the pervasive semantic information of video content in computer vision. Traditional hand-crafted features are insufficient when classifying complex video information due to the similarity of visual contents with different illumination conditions. Prior studies of video classifications focused on the relationship between the standalone streams themselves. In this paper, by leveraging the effects of deep learning methodologies, we propose a two-stream neural network concept, named state-exchanging long short-term memory (SE-LSTM). With the model of spatial motion state-exchanging, the SE-LSTM can classify dynamic patterns of videos using appearance and motion features. The SE-LSTM extends the general purpose of LSTM by exchanging the information with previous cell states of both appearance and motion stream. We propose a novel two-stream model Dual-CNNSELSTM utilizing the SE-LSTM concept combined with a Convolutional Neural Network, and use various video datasets to validate the proposed architecture. The experimental results demonstrate that the performance of the proposed two-stream Dual-CNNSELSTM architecture significantly outperforms other datasets, achieving accuracies of 81.62%, 79.87%, and 69.86% with hand gestures, fireworks displays, and HMDB51 datasets, respectively. Furthermore, the overall results signify that the proposed model is most suited to static background dynamic patterns classifications.
ISSN:	2076-3417 2076-3417
DOI:	10.3390/app10041479