A Super Descriptor Tensor Decomposition for Dynamic Scene Recognition

This paper presents a new approach for dynamic scene recognition based on a super descriptor tensor decomposition. Recently, local feature extraction based on dense trajectories has been used for modeling motion. However, dense trajectories usually include a large number of unnecessary trajectories,...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on circuits and systems for video technology 2019-04, Vol.29 (4), p.1063-1076
Hauptverfasser: Khokher, Muhammad Rizwan, Bouzerdoum, Abdesselam, Phung, Son Lam
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:This paper presents a new approach for dynamic scene recognition based on a super descriptor tensor decomposition. Recently, local feature extraction based on dense trajectories has been used for modeling motion. However, dense trajectories usually include a large number of unnecessary trajectories, which increase noise, add complexity, and limit the recognition accuracy. Another problem is that the traditional bag-of-words techniques encode and concatenate the local features extracted from multiple descriptors to form a single large vector for classification. This concatenation not only destroys the spatio-temporal structure among the features but also yields high dimensionality. To address these problems, first, we propose to refine the dense trajectories by selecting only salient trajectories in a region of interest containing motion. Visual descriptors consisting of oriented gradient and motion boundary histograms are then computed along the refined dense trajectories. In case of camera motion, a short-window video stabilization is integrated to compensate for global motion. Second, the extracted features from multiple descriptors are encoded using a super descriptor tensor model. To this end, the TUCKER-3 tensor decomposition is employed to obtain a compact set of salient features, followed by feature selection via Fisher ranking. Experiments are conducted using two benchmark dynamic scene recognition datasets: Maryland "in-the-wild" and YUPPEN dynamic scenes. Experimental results show that the proposed approach outperforms several existing methods in terms of recognition accuracy and achieves a performance comparable with the state-of-the-art deep learning methods. The proposed approach achieves classification rates of 89.2% for Maryland and 98.1% for YUPPEN datasets.
ISSN:1051-8215
1558-2205
DOI:10.1109/TCSVT.2018.2825784