Augmenting bag-of-words: a robust contextual representation of spatiotemporal interest points for action recognition

Although traditional bag-of-words model, together with local spatiotemporal features, has shown promising results for human action recognition, it ignores all structural information of features, which carries important information of motion structures in videos. Recent methods usually characterize t...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:The Visual computer 2015-10, Vol.31 (10), p.1383-1394
Hauptverfasser: Li, Yang, Ye, Junyong, Wang, Tongqing, Huang, Shijian
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:Although traditional bag-of-words model, together with local spatiotemporal features, has shown promising results for human action recognition, it ignores all structural information of features, which carries important information of motion structures in videos. Recent methods usually characterize the relationship of quantized spatiotemporal features to overcome this drawback. However, the propagation of quantization error leads to an unreliable representation. To alleviate the propagation of quantization error, we present a coding method, which considers not only the spatial similarity but also the reconstruction ability of visual words after giving a probabilistic interpretation of coding coefficients. Based on our coding method, a new type of feature called cumulative probability histogram is proposed to robustly characterize contextual structural information around interest points, which are extracted from multi-layered contexts and assumed to be complementary to local spatiotemporal features. The proposed method is verified on four benchmark datasets. Experiment results show that our method can achieve better performance than previous methods in action recognition.
ISSN:0178-2789
1432-2315
DOI:10.1007/s00371-014-1020-8