Method for solving video question and answer tasks needing common knowledge by using question-knowledge guided progressive space-time attention network

The invention discloses a method for solving a video question and answer task needing common knowledge by using a question-knowledge guided progressive space-time attention network, which comprises the following steps: for a video, obtaining a video object set by using a Faster-RCNN; retrieving an a...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Hauptverfasser: ZHAO ZHOU, ZHANG PINHAN, JIN WEIKE, CHEN MOSHA
Format: Patent
Sprache:chi ; eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
Beschreibung
Zusammenfassung:The invention discloses a method for solving a video question and answer task needing common knowledge by using a question-knowledge guided progressive space-time attention network, which comprises the following steps: for a video, obtaining a video object set by using a Faster-RCNN; retrieving an annotation text corresponding to the video object set in an external knowledge base to obtain external knowledge; extracting semantic features of external knowledge by using Doc2Vec to obtain a knowledge feature set of the video; aiming at the problem, converting an input word into a word embedding vector by using an embedding layer (embedding layer); inputting the word embedding vector into a progressive space-time attention network to generate an answer; by using the additional information, more specific questions, such as some common questions, can be answered; external knowledge and questions are combined, progressive video attention is guided in space and time dimensions, and fine-grained joint video representa