Depth Video-Based Secondary Action Recognition in Vehicles via Convolutional Neural Network and Bidirectional Long Short-Term Memory with Spatial Enhanced Attention Mechanism

Secondary actions in vehicles are activities that drivers engage in while driving that are not directly related to the primary task of operating the vehicle. Secondary Action Recognition (SAR) in drivers is vital for enhancing road safety and minimizing accidents related to distracted driving. It al...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Sensors (Basel, Switzerland) Switzerland), 2024-10, Vol.24 (20), p.6604
Hauptverfasser:	Shao, Weirong, Bouazizi, Mondher, Tomoaki, Ohtuski
Format:	Artikel
Sprache:	eng
Schlagworte:	action recognition Algorithms Attention - physiology attention mechanism Automobile Driving Datasets deep learning depth sensor Humans Memory, Short-Term - physiology Neural networks Neural Networks, Computer Privacy Sensors Video Recording - methods
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	Secondary actions in vehicles are activities that drivers engage in while driving that are not directly related to the primary task of operating the vehicle. Secondary Action Recognition (SAR) in drivers is vital for enhancing road safety and minimizing accidents related to distracted driving. It also plays an important part in modern car driving systems such as Advanced Driving Assistance Systems (ADASs), as it helps identify distractions and predict the driver's intent. Traditional methods of action recognition in vehicles mostly rely on RGB videos, which can be significantly impacted by external conditions such as low light levels. In this research, we introduce a novel method for SAR. Our approach utilizes depth-video data obtained from a depth sensor located in a vehicle. Our methodology leverages the Convolutional Neural Network (CNN), which is enhanced by the Spatial Enhanced Attention Mechanism (SEAM) and combined with Bidirectional Long Short-Term Memory (Bi-LSTM) networks. This method significantly enhances action recognition ability in depth videos by improving both the spatial and temporal aspects. We conduct experiments using K-fold cross validation, and the experimental results show that on the public benchmark dataset Drive&Act, our proposed method shows significant improvement in SAR compared to the state-of-the-art methods, reaching an accuracy of about 84% in SAR in depth videos.
ISSN:	1424-8220 1424-8220
DOI:	10.3390/s24206604