Exploiting Feature and Class Relationships in Video Categorization with Regularized Deep Neural Networks

In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using sim...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transactions on pattern analysis and machine intelligence 2018-02, Vol.40 (2), p.352-364
Hauptverfasser: Jiang, Yu-Gang, Wu, Zuxuan, Wang, Jun, Xue, Xiangyang, Chang, Shih-Fu
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:In this paper, we study the challenging problem of categorizing videos according to high-level semantics such as the existence of a particular human action or a complex event. Although extensive efforts have been devoted in recent years, most existing works combined multiple video features using simple fusion strategies and neglected the utilization of inter-class semantic relationships. This paper proposes a novel unified framework that jointly exploits the feature relationships and the class relationships for improved categorization performance. Specifically, these two types of relationships are estimated and utilized by imposing regularizations in the learning process of a deep neural network (DNN). Through arming the DNN with better capability of harnessing both the feature and the class relationships, the proposed regularized DNN (rDNN) is more suitable for modeling video semantics. We show that rDNN produces better performance over several state-of-the-art approaches. Competitive results are reported on the well-known Hollywood2 and Columbia Consumer Video benchmarks. In addition, to stimulate future research on large scale video categorization, we collect and release a new benchmark dataset, called FCVID, which contains 91,223 Internet videos and 239 manually annotated categories.
ISSN:0162-8828
1939-3539
2160-9292
DOI:10.1109/TPAMI.2017.2670560