Efficient search of Top-K video subvolumes for multi-instance action detection

Action detection was formulated as a subvolume mutual information maximization problem in, where each subvolume identifies where and when the action occurs in the video. Despite the fact that the proposed branch-and-bound algorithm can find the best subvolume efficiently for low resolution videos, i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Goussies, Norberto A, Liu, Zicheng, Junsong Yuan
Format:	Tagungsbericht
Sprache:	eng
Schlagworte:	Acceleration action recognition branch-and-bound Complexity theory Real time systems Search problems Spatial resolution Streaming media Video sequences
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Action detection was formulated as a subvolume mutual information maximization problem in, where each subvolume identifies where and when the action occurs in the video. Despite the fact that the proposed branch-and-bound algorithm can find the best subvolume efficiently for low resolution videos, it is still not efficient enough to perform multi-instance detection in videos of high spatial resolution. In this paper we develop an algorithm that further speeds up the subvolume search and targets on real-time multi-instance action detection for high resolution videos (e.g. 320 × 240 or higher). Unlike the previous branch-and-bound search technique which restarts a new search for each action instance, we find the Top-K subvolumes simultaneously with a single round of search. To handle the larger spatial resolution, we downsample the volume of videos for a more efficient upper-bound estimation. To validate our algorithm, we perform experiments on a challenging dataset of 54 video sequences where each video consists of several actions performed by different people in a crowded environment. The experiments show that our method is not only efficient, but also capable of handling action variations caused by performing speed and style changes, spatial scale changes, as well as cluttered and moving background.
ISSN:	1945-7871 1945-788X
DOI:	10.1109/ICME.2010.5583547