Multi-modal LSTM video action prediction method based on self-attention mechanism

The invention relates to a multi-modal LSTM video action prediction method based on a self-attention mechanism. The method comprises the following steps: 1, preparing a training data set and preprocessing an original video to obtain an RGB picture and an optical flow picture; 2, extracting RGB featu...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: SHAO JIE, MO CHEN
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention relates to a multi-modal LSTM video action prediction method based on a self-attention mechanism. The method comprises the following steps: 1, preparing a training data set and preprocessing an original video to obtain an RGB picture and an optical flow picture; 2, extracting RGB features and optical flow features through a TSN network based on the RGB picture and the optical flow picture, and obtaining features related to target detection through a FastRCNN target detector based on the training data set; 3, establishing a multi-modal LSTM network model based on a self-attentionmechanism, inputting the RGB features and the optical flow features obtained in the step 2 and features related to target detection into the network model for training, and outputting respective corresponding action type distribution tensors; and 4, establishing a fusion network to assign weights to the action type distribution tensors, and combining the weights with the action type distribution tensors to obtain a final