A Super Descriptor Tensor Decomposition for Dynamic Scene Recognition

This paper presents a new approach for dynamic scene recognition based on a super descriptor tensor decomposition. Recently, local feature extraction based on dense trajectories has been used for modeling motion. However, dense trajectories usually include a large number of unnecessary trajectories,...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on circuits and systems for video technology 2019-04, Vol.29 (4), p.1063-1076
Hauptverfasser:	Khokher, Muhammad Rizwan, Bouzerdoum, Abdesselam, Phung, Son Lam
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Adaptive optics Cameras Classification Coding Datasets Decomposition Dynamic scene recognition Dynamics Feature extraction Histograms Machine learning Mathematical analysis Optical imaging refined dense trajectories super descriptor vector Tensile stress tensor decomposition Tensors Trajectories Trajectory
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	This paper presents a new approach for dynamic scene recognition based on a super descriptor tensor decomposition. Recently, local feature extraction based on dense trajectories has been used for modeling motion. However, dense trajectories usually include a large number of unnecessary trajectories, which increase noise, add complexity, and limit the recognition accuracy. Another problem is that the traditional bag-of-words techniques encode and concatenate the local features extracted from multiple descriptors to form a single large vector for classification. This concatenation not only destroys the spatio-temporal structure among the features but also yields high dimensionality. To address these problems, first, we propose to refine the dense trajectories by selecting only salient trajectories in a region of interest containing motion. Visual descriptors consisting of oriented gradient and motion boundary histograms are then computed along the refined dense trajectories. In case of camera motion, a short-window video stabilization is integrated to compensate for global motion. Second, the extracted features from multiple descriptors are encoded using a super descriptor tensor model. To this end, the TUCKER-3 tensor decomposition is employed to obtain a compact set of salient features, followed by feature selection via Fisher ranking. Experiments are conducted using two benchmark dynamic scene recognition datasets: Maryland "in-the-wild" and YUPPEN dynamic scenes. Experimental results show that the proposed approach outperforms several existing methods in terms of recognition accuracy and achieves a performance comparable with the state-of-the-art deep learning methods. The proposed approach achieves classification rates of 89.2% for Maryland and 98.1% for YUPPEN datasets.
ISSN:	1051-8215 1558-2205
DOI:	10.1109/TCSVT.2018.2825784