PhD: A ChatGPT-Prompted Visual hallucination Evaluation Dataset
Multimodal Large Language Models (MLLMs) hallucinate, resulting in an emerging topic of visual hallucination evaluation (VHE). This paper contributes a ChatGPT-Prompted visual hallucination evaluation Dataset (PhD) for objective VHE at a large scale. The essence of VHE is to ask an MLLM questions ab...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Multimodal Large Language Models (MLLMs) hallucinate, resulting in an
emerging topic of visual hallucination evaluation (VHE). This paper contributes
a ChatGPT-Prompted visual hallucination evaluation Dataset (PhD) for objective
VHE at a large scale. The essence of VHE is to ask an MLLM questions about
specific images to assess its susceptibility to hallucination. Depending on
what to ask (objects, attributes, sentiment, etc.) and how the questions are
asked, we structure PhD along two dimensions, i.e., task and mode. Five visual
recognition tasks, ranging from low-level (object / attribute recognition) to
middle-level (sentiment / position recognition and counting), are considered.
Besides a normal visual QA mode, which we term PhD-base, PhD also asks
questions with inaccurate context (PhD-iac) or with incorrect context
(PhD-icc), or with AI-generated counter common sense images (PhD-ccs). We
construct PhD by a ChatGPT-assisted semi-automated pipeline, encompassing four
pivotal modules: task-specific hallucinatory item (hitem) selection,
hitem-embedded question generation, inaccurate / incorrect context generation,
and counter-common-sense (CCS) image generation. With over 14k daily images,
750 CCS images and 102k VQA triplets in total, PhD reveals considerable
variability in MLLMs' performance across various modes and tasks, offering
valuable insights into the nature of hallucination. As such, PhD stands as a
potent tool not only for VHE but may also play a significant role in the
refinement of MLLMs. |
---|---|
DOI: | 10.48550/arxiv.2403.11116 |