Second-order motion descriptors for efficient action recognition
Human action recognition from realistic video data constitutes a challenging and relevant research area. Leading the state of the art we can find those methods based on convolutional neural networks (CNNs), and specially two-stream CNNs. In this family of deep architectures, the appearance channel l...
Gespeichert in:
Veröffentlicht in: | Pattern analysis and applications : PAA 2021-05, Vol.24 (2), p.473-482 |
---|---|
Hauptverfasser: | , , |
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Human action recognition from realistic video data constitutes a challenging and relevant research area. Leading the state of the art we can find those methods based on convolutional neural networks (CNNs), and specially two-stream CNNs. In this family of deep architectures, the appearance channel learns from the RGB images and the motion channel learns from a motion representation, usually, the optical flow. Given that action recognition requires the extraction of complex motion patterns descriptors in image sequences, we introduce a new set of second-order motion representations capable of capturing both: geometrical and kinematic properties of the motion (curl, div, curvature, and acceleration). Besides, we present a new and effective strategy capable of reducing training times without sacrificing the performance when using the I3D two-stream CNN and robust to the weakness of a single channel. The experiments presented in this paper were carried out over two of the most challenging datasets for action recognition: UCF101 and HMDB51. Reported results show an improvement in accuracy over the UCF101 dataset where an accuracy of 98.45% is achieved when the curvature and acceleration are combined as a motion representation. For the HMDB51, our approach shows a competitive performance, achieving an accuracy of 80.19%. In both datasets, our approach shows a considerable reduction in time for the preprocessing and training phases. Preprocessing time is reduced to a sixth of the time while the training procedure for the motion stream can be performed in a third of the time usually employed. |
---|---|
ISSN: | 1433-7541 1433-755X |
DOI: | 10.1007/s10044-020-00924-2 |