Learning a discriminative mid-level feature for action recognition

In this paper, we address the problem of recognizing human actions from videos. Most of the existing approaches employ Iow-level features （e.g., local features and global features） to represent an action video. However, algorithms based on low-level features are not robust to complex environments su...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Science China. Information sciences 2014-05, Vol.57 (5), p.191-203
Hauptverfasser:	Liu, CuiWei, Pei, MingTao, Wu, XinXiao, Kong, Yu, Jia, YunDe
Format:	Artikel
Sprache:	eng
Schlagworte:	3D功能 Activity recognition Algorithms Categories Computer Science Context Feature recognition Forests Histograms Human Information Systems and Communication Service Learning Machine learning Optical flow (image analysis) Research Paper Robustness Temporal logic Three dimensional Three dimensional flow 人类行为全局特征动作识别学习直方图行为识别辨别
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	In this paper, we address the problem of recognizing human actions from videos. Most of the existing approaches employ Iow-level features （e.g., local features and global features） to represent an action video. However, algorithms based on low-level features are not robust to complex environments such as cluttered background, camera movement and illumination change. Therefore, we propose a novel random forest learning framework to construct a discriminative and informative mid-level feature from low-level features of densely sampled 3D cuboids. Each cuboid is classified by the corresponding random forests with a novel fusion scheme, and the cuboid＇s posterior probabilities of all categories are normalized to generate a histogram. After that, we obtain our mid-level feature by concatenating histograms of all the enboids. Since a single low-level feature is not enough to capture the variations of human actions, multiple complementary low-level features （lie., optical flow and histogram of gradient 3D features） are employed to describe 3D cuboids. Moreover, temporal context between local euboids is exploited as another type of low-level feature. The above three low-level features （i.e., optical flow, histogram of gradient 3D features and temporal context） are effectively fused in the proposed learning framework. Finally, the mid-level feature is employed by a random forest classifier for robust action recognition. Experiments on the Weizmann, UCF sports, Ballet, and multi-view IXMAS datasets demonstrate that out mid-level feature learned from multiple low-level features can achieve a superior performance over state-of-the-art methods.
ISSN:	1674-733X 1869-1919
DOI:	10.1007/s11432-013-4938-y