Method for solving video question and answer tasks needing common knowledge by using question-knowledge guided progressive space-time attention network
The invention discloses a method for solving a video question and answer task needing common knowledge by using a question-knowledge guided progressive space-time attention network, which comprises the following steps: for a video, obtaining a video object set by using a Faster-RCNN; retrieving an a...
Gespeichert in:
Hauptverfasser: | , , , |
---|---|
Format: | Patent |
Sprache: | chi ; eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | The invention discloses a method for solving a video question and answer task needing common knowledge by using a question-knowledge guided progressive space-time attention network, which comprises the following steps: for a video, obtaining a video object set by using a Faster-RCNN; retrieving an annotation text corresponding to the video object set in an external knowledge base to obtain external knowledge; extracting semantic features of external knowledge by using Doc2Vec to obtain a knowledge feature set of the video; aiming at the problem, converting an input word into a word embedding vector by using an embedding layer (embedding layer); inputting the word embedding vector into a progressive space-time attention network to generate an answer; by using the additional information, more specific questions, such as some common questions, can be answered; external knowledge and questions are combined, progressive video attention is guided in space and time dimensions, and fine-grained joint video representa |
---|