Multi-modal LSTM video action prediction method based on self-attention mechanism
The invention relates to a multi-modal LSTM video action prediction method based on a self-attention mechanism. The method comprises the following steps: 1, preparing a training data set and preprocessing an original video to obtain an RGB picture and an optical flow picture; 2, extracting RGB featu...
Gespeichert in:
Hauptverfasser: | , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention relates to a multi-modal LSTM video action prediction method based on a self-attention mechanism. The method comprises the following steps: 1, preparing a training data set and preprocessing an original video to obtain an RGB picture and an optical flow picture; 2, extracting RGB features and optical flow features through a TSN network based on the RGB picture and the optical flow picture, and obtaining features related to target detection through a FastRCNN target detector based on the training data set; 3, establishing a multi-modal LSTM network model based on a self-attentionmechanism, inputting the RGB features and the optical flow features obtained in the step 2 and features related to target detection into the network model for training, and outputting respective corresponding action type distribution tensors; and 4, establishing a fusion network to assign weights to the action type distribution tensors, and combining the weights with the action type distribution tensors to obtain a final |
---|