Devils in Middle Layers of Large Vision-Language Models: Interpreting, Detecting and Mitigating Object Hallucinations via Attention Lens
Hallucinations in Large Vision-Language Models (LVLMs) significantly undermine their reliability, motivating researchers to explore the causes of hallucination. However, most studies primarily focus on the language aspect rather than the visual. In this paper, we address how LVLMs process visual inf...
Gespeichert in:
Hauptverfasser: | , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Hallucinations in Large Vision-Language Models (LVLMs) significantly
undermine their reliability, motivating researchers to explore the causes of
hallucination. However, most studies primarily focus on the language aspect
rather than the visual. In this paper, we address how LVLMs process visual
information and whether this process causes hallucination. Firstly, we use the
attention lens to identify the stages at which LVLMs handle visual data,
discovering that the middle layers are crucial. Moreover, we find that these
layers can be further divided into two stages: "visual information enrichment"
and "semantic refinement" which respectively propagate visual data to object
tokens and interpret it through text. By analyzing attention patterns during
the visual information enrichment stage, we find that real tokens consistently
receive higher attention weights than hallucinated ones, serving as a strong
indicator of hallucination. Further examination of multi-head attention maps
reveals that hallucination tokens often result from heads interacting with
inconsistent objects. Based on these insights, we propose a simple
inference-time method that adjusts visual attention by integrating information
across various heads. Extensive experiments demonstrate that this approach
effectively mitigates hallucinations in mainstream LVLMs without additional
training costs. |
---|---|
DOI: | 10.48550/arxiv.2411.16724 |