Triple Multimodal Cyclic Fusion and Self-Adaptive Balancing for Video Q&A Systems

Performance of Video Question and Answer (VQA) systems relies on capturing key information of both visual images and natural language in the context to generate relevant questions’ answers. However, traditional linear combinations of multimodal features focus only on shallow feature interactions, fa...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	Computers, materials & continua materials & continua, 2022-01, Vol.73 (3), p.6407
Hauptverfasser:	Zhang, Xiliang, Liu, Jin, Li, Yue, Wu, Zhongdai, Wang, Y Ken
Format:	Artikel
Sprache:	eng
Schlagworte:	Ablation Algorithms Balancing Datasets Language Natural language processing Questions Semantics
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!