A compact and recursive Riemannian motion descriptor for untrimmed activity recognition

A very low dimension frame-level motion descriptor is herein proposed with the capability to represent incomplete dynamics, thus allowing online action prediction. At each frame, a set of local trajectory kinematic cues are spatially pooled using a covariance matrix. The set of frame-level covarianc...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Journal of real-time image processing 2021-12, Vol.18 (6), p.1867-1880
Hauptverfasser:	Martı́nez Carrillo, Fabio, Gouiffès, Michèle, Garzón Villamizar, Gustavo, Manzanera, Antoine
Format:	Artikel
Sprache:	eng
Schlagworte:	Activity recognition Classification Computer Graphics Computer Science Computer Vision and Pattern Recognition Covariance matrix Evaluation Image Processing Image Processing and Computer Vision Kinematics Learning Mathematical analysis Multimedia Information Systems Original Research Paper Pattern Recognition Riemann manifold Signal,Image and Speech Processing
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	A very low dimension frame-level motion descriptor is herein proposed with the capability to represent incomplete dynamics, thus allowing online action prediction. At each frame, a set of local trajectory kinematic cues are spatially pooled using a covariance matrix. The set of frame-level covariance matrices forms a Riemannian manifold that describes motion patterns. A set of statistic measures are computed over this manifold to characterize the sequence dynamics, either globally, or instantaneously from a motion history. Regarding the Riemannian metrics, two different versions are proposed: (1) by considering tangent projections with respect to updated recursive statistics, and (2) by mapping the covariance onto a linear matrix using as reference the identity matrix. The proposed approach was evaluated for two different tasks: (1) for action classification on complete video sequences and (2) for online action recognition, in which the activity is predicted at each frame. The method was evaluated using two public datasets: KTH and UT-interaction. For action classification, the method achieved an average accuracy of 92.27 and 81.67%, for KTH and UT-interaction, respectively. In partial recognition task, the proposed method achieved similar classification rate as for the whole sequence using only the 40 and 70% on KTH and UT sequences, respectively. The code of this work is available at [code].
ISSN:	1861-8200 1861-8219
DOI:	10.1007/s11554-020-01057-9