Decompose and Compare Consistency: Measuring VLMs' Answer Reliability via Task-Decomposition Consistency Comparison
Despite tremendous advancements, current state-of-the-art Vision-Language Models (VLMs) are still far from perfect. They tend to hallucinate and may generate biased responses. In such circumstances, having a way to assess the reliability of a given response generated by a VLM is quite useful. Existi...
Gespeichert in:
Hauptverfasser: | , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Despite tremendous advancements, current state-of-the-art Vision-Language
Models (VLMs) are still far from perfect. They tend to hallucinate and may
generate biased responses. In such circumstances, having a way to assess the
reliability of a given response generated by a VLM is quite useful. Existing
methods, such as estimating uncertainty using answer likelihoods or
prompt-based confidence generation, often suffer from overconfidence. Other
methods use self-consistency comparison but are affected by confirmation
biases. To alleviate these, we propose Decompose and Compare Consistency (DeCC)
for reliability measurement. By comparing the consistency between the direct
answer generated using the VLM's internal reasoning process, and the indirect
answers obtained by decomposing the question into sub-questions and reasoning
over the sub-answers produced by the VLM, DeCC measures the reliability of
VLM's direct answer. Experiments across six vision-language tasks with three
VLMs show DeCC's reliability estimation achieves better correlation with task
accuracy compared to the existing methods. |
---|---|
DOI: | 10.48550/arxiv.2407.07840 |