A multimodal deep neural network for prediction of the driver’s focus of attention based on anthropomorphic attention mechanism and prior knowledge

•A more comprehensive information of driving scene is as the input.•An anthropomorphic attention mechanism is developed to calculate the importance.•A graph attention network is adopted to learn semantic context features.•Convolutional long short-term memory network achieves the transition of fused...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:Expert systems with applications 2023-03, Vol.214, p.119157, Article 119157
Hauptverfasser: Fu, Rui, Huang, Tao, Li, Mingyue, Sun, Qinyu, Chen, Yunxing
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:•A more comprehensive information of driving scene is as the input.•An anthropomorphic attention mechanism is developed to calculate the importance.•A graph attention network is adopted to learn semantic context features.•Convolutional long short-term memory network achieves the transition of fused features.•A training method based on prior knowledge is designed. The prediction of the driver’s focus of attention (DFoA) is becoming essential research for the driver distraction detection and intelligent vehicle. Therefore, this work makes an attempt to predict DFoA. However, traffic driving environment is a complex and dynamic changing scene. The existing methods lack full utilization of driving scene information and ignore the importance of different objects or regions of the driving scene. To alleviate this, we propose a multimodal deep neural network based on anthropomorphic attention mechanism and prior knowledge (MDNN-AAM-PK). Specifically, a more comprehensive information of driving scene (RGB images, semantic images, optical flow images and depth images of successive frames) is as the input of MDNN-AAM-PK. An anthropomorphic attention mechanism is developed to calculate the importance of each pixel in the driving scene. A graph attention network is adopted to learn semantic context features. The convolutional long short-term memory network (ConvLSTM) is used to achieve the transition of fused features in successive frames. Furthermore, a training method based on prior knowledge is designed to improve the efficiency of training and the performance of DFoA prediction. These experiments, including experimental comparison with the state-of-the-art methods, the ablation study of the proposed method, the evaluation on different datasets and the visual assessment experiment in vehicle simulation platform, show that the proposed method can accurately predict DFoA and is better than the state-of-the-art methods.
ISSN:0957-4174
1873-6793
DOI:10.1016/j.eswa.2022.119157