Assessing Modality Bias in Video Question Answering Benchmarks with Multimodal Large Language Models

Multimodal large language models (MLLMs) can simultaneously process visual, textual, and auditory data, capturing insights that complement human analysis. However, existing video question-answering (VidQA) benchmarks and datasets often exhibit a bias toward a single modality, despite the goal of req...

Ausführliche Beschreibung

Gespeichert in:

Bibliographische Detailangaben
Hauptverfasser:	Park, Jean, Jang, Kuk Jin, Alasaly, Basam, Mopidevi, Sriharsha, Zolensky, Andrew, Eaton, Eric, Lee, Insup, Johnson, Kevin
Format:	Artikel
Sprache:	eng
Schlagworte:	Computer Science - Artificial Intelligence Computer Science - Computation and Language Computer Science - Learning
Online-Zugang:	Volltext bestellen
Tags:	Tag hinzufügen Keine Tags, Fügen Sie den ersten Tag hinzu!

Schreiben Sie den ersten Kommentar!