Action Recognition and Benchmark Using Event Cameras

Recent years have witnessed remarkable achievements in video-based action recognition. Apart from traditional frame-based cameras, event cameras are bio-inspired vision sensors that only record pixel-wise brightness changes rather than the brightness value. However, little effort has been made in ev...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transactions on pattern analysis and machine intelligence 2023-12, Vol.PP (12), p.1-17
Hauptverfasser:	Gao, Yue, Lu, Jiaxuan, Li, Siqi, Ma, Nan, Du, Shaoyi, Li, Yipeng, Dai, Qionghai
Format:	Artikel
Sprache:	eng
Schlagworte:	Action recognition Activity recognition Benchmark testing Benchmarks Brightness Cameras Datasets dynamic vision sensor event camera event representation Filtering Recording Task analysis Vision sensors Visualization
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Recent years have witnessed remarkable achievements in video-based action recognition. Apart from traditional frame-based cameras, event cameras are bio-inspired vision sensors that only record pixel-wise brightness changes rather than the brightness value. However, little effort has been made in event-based action recognition, and large-scale public datasets are also nearly unavailable. In this paper,we present an event-based action recognition framework called EV-ACT. The Learnable Multi-Fused Representation (LMFR) is first proposed to integrate multiple event information in a learnable manner. The LMFR with dual temporal granularity is fed into the event-based slow-fast network for the fusion of appearance and motion features. A spatial-temporal attention mechanism is introduced to further enhance the learning capability of action recognition. To prompt research in this direction, we have collected the largest event-based action recognition benchmark named \mathbf{THU}^{\mathbf{E-ACT}}\mathbf{-50} and the accompanying \mathbf{THU}^{\mathbf{E-ACT}}\mathbf{-50-CHL} dataset under challenging environments, including a total of over 12,830 recordings from 50 action categories, which is over 4 times the size of the previous largest dataset. Experimental results show that our proposed framework could achieve improvements of over 14.5%, 7.6%, 11.2%, and 7.4% compared to previous works on four benchmarks. We have also deployed our proposed EV-ACT framework on a mobile platform to validate its practicality and efficiency.
ISSN:	0162-8828 1939-3539 2160-9292
DOI:	10.1109/TPAMI.2023.3300741