One-Shot SADI-EPE: A Visual Framework of Event Progress Estimation
In many practical engineering applications, the number of actions that have been finished should be known, particularly for an untrimmed video sequence that includes an event with a series of actions, it is important to know the number of actions that have been finished. In this paper, we termed thi...
Gespeichert in:
Veröffentlicht in: | IEEE transactions on circuits and systems for video technology 2019-06, Vol.29 (6), p.1659-1671 |
---|---|
Hauptverfasser: | , , , , , , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | In many practical engineering applications, the number of actions that have been finished should be known, particularly for an untrimmed video sequence that includes an event with a series of actions, it is important to know the number of actions that have been finished. In this paper, we termed this process as visual event progress estimation (EPE). However, the research related to this problem is few in the research community. To solve this problem, a visual human action analysis-based framework, namely one-shot simultaneously action detection and identification (SADI)-EPE, is presented in this paper. The visual EPE is modeled as an online one-shot learning-based problem; sliding window and attention-based bag of key poses formulate our framework. Unlike most of the action analysis methods relying on a number of training data of some predefined classes, our method can realize SADI for any event if one sample of the event is given, which makes it feasible for practical applications. At the same time, not only SADI but also the progress estimation of the event can be realized by our algorithm. In terms of methodology, the key pose is defined by an invariant pose descriptor from skeletal data and silhouette data. Moreover, in order to extract representative and discriminative poses from one training sample, we present a new bidirectional k NN-based attention weighted key pose selection method, which can filter the unrelated actions and model different importance of various key poses. In addition, an attention-based multi-modal fusion scheme, which addresses the difficulty of high-dimensional features and few training samples, is proposed to augment the performance of our algorithm. Finally, we propose an evaluation criterion for the estimation problem. Extensive results demonstrated the efficacy of our proposed framework. |
---|---|
ISSN: | 1051-8215 1558-2205 |
DOI: | 10.1109/TCSVT.2018.2847305 |