Pre-training Model Based on Parallel Cross-Modality Fusion Layer

Visual Question Answering (VQA) is a learning task that combines computer vision with natural language processing. In VQA, it is important to understand the alignment between visual concepts and linguistic semantics. In this paper, we proposed a Pre-training Model Based on Parallel Cross-Modality Fu...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Veröffentlicht in:	PloS one 2022-02, Vol.17 (2), p.e0260784
Hauptverfasser:	Li, Xuewei, Han, Dezhi, Chang, Chin-Chen
Format:	Artikel
Sprache:	eng
Schlagworte:	Ablation Algorithms Biology and Life Sciences Coders Cognitive tasks Computational linguistics Computer and Information Sciences Computer vision Consortia Datasets Humans Language Language processing Machine vision Modelling Natural language interfaces Natural Language Processing Questions Semantics Social Sciences Training
Online-Zugang:	Volltext
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!