Bridging the Cross-Modality Semantic Gap in Visual Question Answering

The objective of visual question answering (VQA) is to adequately comprehend a question and identify relevant contents in an image that can provide an answer. Existing approaches in VQA often combine visual and question features directly to create a unified cross-modality representation for answer i...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	IEEE transaction on neural networks and learning systems 2024-03, Vol.PP, p.1-13
Hauptverfasser:	Wang, Boyue, Ma, Yujian, Li, Xiaoyan, Gao, Junbin, Hu, Yongli, Yin, Baocai
Format:	Artikel
Sprache:	eng
Schlagworte:	Caption bridge contrastive learning cross-modality analysis visual question answering (VQA)
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!