Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models
Hallucination is often regarded as a major impediment for using large language models (LLMs), especially for knowledge-intensive tasks. Even when the training corpus consists solely of true statements, language models still generate hallucinations in the form of amalgamations of multiple facts. We c...
Gespeichert in:
Hauptverfasser: | , , , , , , , |
---|---|
Format: | Artikel |
Sprache: | eng |
Schlagworte: | |
Online-Zugang: | Volltext bestellen |
Tags: |
Tag hinzufügen
Keine Tags, Fügen Sie den ersten Tag hinzu!
|
Zusammenfassung: | Hallucination is often regarded as a major impediment for using large
language models (LLMs), especially for knowledge-intensive tasks. Even when the
training corpus consists solely of true statements, language models still
generate hallucinations in the form of amalgamations of multiple facts. We coin
this phenomenon as ``knowledge overshadowing'': when we query knowledge from a
language model with multiple conditions, some conditions overshadow others,
leading to hallucinated outputs. This phenomenon partially stems from training
data imbalance, which we verify on both pretrained models and fine-tuned
models, over a wide range of LM model families and sizes.From a theoretical
point of view, knowledge overshadowing can be interpreted as
over-generalization of the dominant conditions (patterns). We show that the
hallucination rate grows with both the imbalance ratio (between the popular and
unpopular condition) and the length of dominant condition description,
consistent with our derived generalization bound. Finally, we propose to
utilize overshadowing conditions as a signal to catch hallucination before it
is produced, along with a training-free self-contrastive decoding method to
alleviate hallucination during inference. Our proposed approach showcases up to
82% F1 for hallucination anticipation and 11.2% to 39.4% hallucination control,
with different models and datasets. |
---|---|
DOI: | 10.48550/arxiv.2407.08039 |