Bridging the Cross-Modality Semantic Gap in Visual Question Answering

The objective of visual question answering (VQA) is to adequately comprehend a question and identify relevant contents in an image that can provide an answer. Existing approaches in VQA often combine visual and question features directly to create a unified cross-modality representation for answer i...

Ausführliche Beschreibung

Gespeichert in:
Bibliographische Detailangaben
Veröffentlicht in:IEEE transaction on neural networks and learning systems 2024-03, Vol.PP, p.1-13
Hauptverfasser: Wang, Boyue, Ma, Yujian, Li, Xiaoyan, Gao, Junbin, Hu, Yongli, Yin, Baocai
Format: Artikel
Sprache:eng
Schlagworte:
Online-Zugang:Volltext bestellen
Tags: Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!