ESDAR-net: towards high-accuracy and real-time driver action recognition for embedded systems

Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehic...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Multimedia tools and applications 2024-02, Vol.83 (6), p.18281-18307
Hauptverfasser:	Hu, Yaocong, Shuai, Zhen, Yang, Huicheng, Wan, Guoyang, Zhang, Yajun, Xie, Chao, Lu, Mingqi, Lu, Xiaobo
Format:	Artikel
Sprache:	eng
Schlagworte:	Accuracy Activity recognition Computational efficiency Computer Communication Networks Computer Science Computing costs Data Structures and Information Theory Deep learning Embedded systems Modules Multimedia Information Systems Real time Special Purpose and Application-Based Systems Track 6: Computer Vision for Multimedia Applications
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Existing driver action recognition approaches suffer from a bottleneck problem which is the trade-off between recognition accuracy and computational efficiency. More specifically, the high-capacity spatial-temporal deep learning model is unable to realize real-time driver action recognition on vehicle-mounted device. To overcome such limitation, this paper puts forward a novel driver action recognition solution suitable for embedded systems. The proposed ESDAR-Net is a multi-branch deep learning framework and directly processes compressed videos. To reduce the computational cost, a lightweight 2D/3D convolutional network is employed for spatial-temporal modeling. Moreover, two strategies are implemented to boost the accuracy performance: (1) cross-layer connection module (CLCM) and spatial-temporal trilinear pooling module (STTPM) are designed to adaptively fuse appearance and motion information; (2) complementary knowledge from the high-capacity spatial-temporal deep learning model is distilled and transferred to the proposed ESDAR-Net. Experimental results show that the proposed ESDAR-Net satisfies both high-accuracy and real-time for driver action recognition. The accuracy on SEU-DAR-V1, SEU-DAR-V2 reaches 98.7%, 96.5%, with learnable parameters of 2.19M, FLOPs of 0.253G, and speed of 27 clips/s on JETSON TX2.
ISSN:	1573-7721 1380-7501 1573-7721
DOI:	10.1007/s11042-023-15777-0