Multi-modal LSTM video action prediction method based on self-attention mechanism

The invention relates to a multi-modal LSTM video action prediction method based on a self-attention mechanism. The method comprises the following steps: 1, preparing a training data set and preprocessing an original video to obtain an RGB picture and an optical flow picture; 2, extracting RGB featu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	SHAO JIE, MO CHEN
Format:	Patent
Sprache:	chi ; eng
Schlagworte:	CALCULATING COMPUTER SYSTEMS BASED ON SPECIFIC COMPUTATIONAL MODELS COMPUTING COUNTING HANDLING RECORD CARRIERS PHYSICS PRESENTATION OF DATA RECOGNITION OF DATA RECORD CARRIERS
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Beschreibung
Zusammenfassung:	The invention relates to a multi-modal LSTM video action prediction method based on a self-attention mechanism. The method comprises the following steps: 1, preparing a training data set and preprocessing an original video to obtain an RGB picture and an optical flow picture; 2, extracting RGB features and optical flow features through a TSN network based on the RGB picture and the optical flow picture, and obtaining features related to target detection through a FastRCNN target detector based on the training data set; 3, establishing a multi-modal LSTM network model based on a self-attentionmechanism, inputting the RGB features and the optical flow features obtained in the step 2 and features related to target detection into the network model for training, and outputting respective corresponding action type distribution tensors; and 4, establishing a fusion network to assign weights to the action type distribution tensors, and combining the weights with the action type distribution tensors to obtain a final