Hierarchical Conditional Relation Networks for Multimodal Video Question Answering

Video Question Answering (Video QA) challenges modelers in multiple fronts. Modeling video necessitates building not only spatio-temporal models for the dynamic visual channel but also multimodal structures for associated information channels such as subtitles or audio. Video QA adds at least two mo...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	International journal of computer vision 2021-11, Vol.129 (11), p.3027-3050
Hauptverfasser:	Le, Thao Minh, Le, Vuong, Venkatesh, Svetha, Tran, Truyen
Format:	Artikel
Sprache:	eng
Schlagworte:	Artificial Intelligence Audio data Channels Complexity Computer Imaging Computer Science Domains Image Processing and Computer Vision Linguistics Pattern Recognition Pattern Recognition and Graphics Questions Subtitles & subtitling Vision
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!